I think they are making a mistake that's common in this sort of project: trying ...

jupp0r · on Nov 17, 2022

The two areas of managing source code and distributed computing are not as disjunct as you make them in the context of Unison. They follow from the underlying principle of addressing functions not by their name but by a hash of their normalized syntax tree (ie their code).

There are a bunch of cool implications for distributed computing, namely that you can easily distribute fine grained parts of your application across servers and that you can cache the result of expensive calculations.

msla · on Nov 17, 2022

> They follow from the underlying principle of addressing functions not by their name but by a hash of their normalized syntax tree (ie their code).

The first time the language or compiler changes such that the same code generates a different syntax tree they'd have to do something pretty fancy to avoid rebuilding the world. (That, plus all the usual caveats about what happens when old hash algorithms meet malicious actors from the future.)

stewoconnor · on Nov 17, 2022

(Unison Developer here), yes we've done two such migrations this year. It has typically meant that we have an "are you ready to upgrade" message when you start up and the migration took less than a minute on my pretty large codebase. It's not a big deal

msla · on Nov 17, 2022

It's a solved problem technically, certainly, but assuming all of the relevant source code will always be available ignores some social and legal issues.

stewoconnor · on Nov 17, 2022

The relevant sourcecode is not available, we don't store any source code. All the relevant dependences MUST be availble in AST form though.

I don't know what the social and legal issues might possibly be, though I might be missing somehting, what do you have in mind there?

msla · on Nov 18, 2022

One issue I was thinking of was companies that distribute code in binary form only, not ASTs or anything which could be used to steal their thoughts. The other issue was in reverse, however: A binary-only package is unmaintained and lists as dependencies hashes that no longer exist because they're the hashed versions of ASTs that, for one reason or another, the compiler won't generate, even if the source still exists. Versioning and archiving would help this case, at least.

jupp0r · on Nov 17, 2022

It's a common approach in language design to require this though. Rust has made very similar design choices due to its (current) lack of stable ABIs. IIRC some hash of the compiler version is included in built libraries and prevents accidental linking. You need to rebuild the world for every toolchain upgrade.

cwalv · on Nov 18, 2022

Also if you change one very low level function (maybe something in the runtime, Unicode handling etc.) you'd also have to recompile the world. In some ways it's nice to reference things by a name, and let the implementation change without needing to care about the details. It's semver's raison d'etre

dack · on Nov 17, 2022

To be fair, for a while they were ALSO working on their own graphical source editor that allowed for type-correct transformations and assisted refactorings. They put that on the back burner specifically because they are trying to focus on fewer things :)

I think the distributed computing problem is pretty related once you have "content-addressable" source code. Agreed that it's a lot of work but I hope it pans out!

renox · on Nov 17, 2022

I disagree, there's no real relationship between 'content addressable' source code and distributed computing.

Also I don't think that you need to create a new language to have 'content addressable' source code distribution..

Creating yet another language ensure that this will get nowhere, too bad.

zawodnaya · on Nov 17, 2022

Well, there is a relationship. The relationship is specifically that Unison nodes can communicate code with one another unambiguously by exchanging hashes.

pharmakom · on Nov 17, 2022

Would this not work just as well with a lisp or even JS?

stewoconnor · on Nov 17, 2022

(I'm the a Unison employee that works mostly on distributed computing)

There are a few things about unison that make this easier:

* content addressed code * unison can serialize any closure

I can ask for a serialized version of any closure, and get back something that is portable to another runtime. So in a function, I can create a lambda that closes over some local variables in my function, ask the runtime for the code for this closure and send it to a remote node for execution. It won't be a serialized version of my entire program and all of its dependencies, it will be a hash of a tree, which is a tree of other hashes.

The remote node can inspect the tree and ask for or gossip to find out the definitions for whatever hashes in that tree it doesn't already know about. It can inspect the node for any forbidden hashes (for example, you can'd do arbitrary IO), then the remote node can evaluate the closure, and return a result.

In other langauges, perhaps you can dynmically ship code around to be dynamically exeecuted, but you aren't going to also get "and of course ship all the transitive dependencies of this closure as needed" as easily as we are able to.

actionfromafar · on Nov 17, 2022

In a simple Lisp machine, such as something resembling PicoLisp, I can't see why not - iff instead of car and cdr being just linear addresses in a memory, have them itself be hashes.

Since everything is made up from car and cdr, it's easy going from there. There is no difference between running locally, or anywhere. Just look up the data by request or gossip, as you said.

(For performance reasons, one might want to let a cons which represents "source" smaller than the size of a hash, be a literal representation instead of hashed. No need to do a lookup from a hash when you can have the code/data in the cons itself. Analoguous, don't zip a file when the resulting zip would be larger.)

stewoconnor · on Nov 18, 2022

I'm not totally up to date on lisp and don't know anything about PicoLisp, so forgive me if there is stuff I'm missing :) but lemme try:

Lets say you wrote a imaginary program to sum a column in a csv:

(defun my-program () (let* ((raw-data) (s3-load-file "htpps://...")) ((parsed) (csv-parse raw-data)) ((column1) (csv-column parsed 1)) (mean column1))

In this pretend program we are using some 3rd party s3 library to fetch some data, some other 3rd party CSV library to extract the data. Now I want to run this on some other remote node. I know I can just sent that sexp to the remote node and have it eval it, but that is only going to work if the right versions of the S3 and CSV library are already in that runtime.

I want to be able to write a function like:

(defun remote-run (prog) (....))

That can take ANY program and ship it off to some remote node and have that remote node calculate the result and ship the answer back. I don't know of some way in lisp you could ask the runtime to give you a program which includes your calculation and the s3 functions and the csv functions.

In unison, I can just say `(seialize my-program)` and get back a byte array which represents exactly the functions we need to evaluate this closure. When the remote site tries to load those bytes into their runtime, it will either succeed or fail with "more information needed" with a list of additional hashes we need code for, and the two runtimes can recurse through this until the remote side has everything needed to load that value into the runtime so that it can be evaluated to a result.

Then, of course, this opens us up to the ability for us to be smart in places and say "before you try running this, check the cache to see if anyone ran this closure recently and already knows the answer"

nanomonkey · on Nov 18, 2022

In lisp there are symbols and lists. The symbols could be pointers to content addressable hashes, or in-lined content. So in the namespace you could require the library by its content addressable hash, and give it a symbol definition, or just refer to the hash itself.

actionfromafar · on Nov 18, 2022

I'm not saying there exists a Lisp with a turn-key distributed cloud runtime. Like the sibling answer, I'm saying, it's not super complicated. Instead of loading an .so file the regular way, load it via hash.

The car/cdr nature of Lisp makes it almost uniquely suited to distributed runtime IMHO.

As for your last sentence, this touches on compilation. Solve this, and you solve also dependency detection and compilation. I feel there are so many things which could/should converge at some point in the future. IDE / version control, distributed compute, storage.

An AST (or source) program could have a hash, which is corresponding in a cache to a compiled or JIT-ed version of that compilation unit, and various eval results of that unit etc etc. So many things collapse into one once your started to treat everything as key/value.

pharmakom · on Nov 18, 2022

Is Unsion committed to pure functions in order to support this?

zawodnaya · on Nov 19, 2022

Yes, Unison is a purely functional language.

zawodnaya · on Nov 17, 2022

Well, it could be done in JS or Lisp. You'd have to replace all the references to dependencies in every function with a hash of the implementation of the referenced function and use some kind of global lookup table (which would need to be a distributed hash table or something). But this would be slow, so you'd need a compiler or some kind of processor to inline stuff and do CPS transforms, and by that time you've basically implemented a janky version of Unison.

actionfromafar · on Nov 17, 2022

Or a beautiful, conceptually simple version of Unison. In any case, not more convoluted than for instance Javascript V8 already is to get the performance it has.

Quekid5 · on Nov 17, 2022

The key idea is that Unison code is in a sort of "normal form", such that e.g. what names you give things doesn't matter at all.

zubairq · on Nov 17, 2022

Yes it does work with JavaScript. At yazz that is exactly how we store is code, by the iPfs hash

waynesonfire · on Nov 18, 2022

or Erlang.

klabb3 · on Nov 17, 2022

How about wasm? Interop de lux

zawodnaya · on Nov 17, 2022

An issue with exchanging WASM is you'll have to do either dynamic or static linking of dependencies. Unison works around this by having all definitions share a global address space (and the address of any given code in that space is the hash of its syntax tree), so there is no linking step. Or rather, the linking is done trivially by the hashing.

klabb3 · on Nov 17, 2022

> Creating yet another language ensure that this will get nowhere, too bad.

What has happened before, consistently, is that research or proof of concept- style languages pave the way for bigger players to take the ideas and incorporate them into existing or future mainstream languages.

nerdponx · on Nov 17, 2022

My impression is that the original use case was distributed computing, and the content-addressable stuff took on a life of its own after that.

ghc · on Nov 18, 2022

This is completely correct. Unison was always a distributed computing project, even when it was just an idea.

xwowsersx · on Nov 17, 2022

I thought that was sort of Paul's goal, to explore a bunch of new ideas and paradigms. I doubt he's under any illusions that this language, qua this language, is going to see wide adoption or need to have all of its features really ironed out with a fixed/stable API. But I could be wrong!

jrumbut · on Nov 17, 2022

Then that flips it back to being very cool that he is putting his energy into carrying a few batons far enough for others to pick them up.

xwowsersx · on Nov 17, 2022

Agreed, I think it's great to have really cutting-edge, highly experimental sort of research languages that explore new paradigms. Doesn't mean you have to like the language, but I think it's good that some people are doing this kind of work as a going concern and not just some repo that was pushed up once somewhere and languishes. He's got some interesting ideas in there related to effects system, etc.

xwowsersx · on Nov 17, 2022

> carrying a few batons far enough for others to pick them up.

I love that metaphor btw. Gonna use that.

IshKebab · on Nov 17, 2022

> I doubt he's under any illusions that this language, qua this language, is going to see wide adoption or need to have all of its features really ironed out with a fixed/stable API. But I could be wrong!

They have VC funding and employees. I presume he's told investors that it will see at least fairly wide adoption!

xwowsersx · on Nov 17, 2022

Ha! Did not know that. Not necessarily dispositive though. Also, I believe they are a public benefit corporation so that would also cut in favor of my point...I think

ghc · on Nov 18, 2022

You have it backwards. The source code management (as a marketable feature) came at least one year and maybe two years after the distributed computing parts. The earliest Unison demos (circa 2016) were "build a distributed google crawler in just a few lines of code". I think the hashes were always a component, but not part of the real messaging for quite awhile.

daxfohl · on Nov 18, 2022

Sounds like `Opa!`, which was a language plus a meteor-like framework. The language itself was really nice with some interesting type-level features, but the built-in framework was the selling point. But all that ultimately got in the way for doing anything that wasn't built into it, which ended up being pretty much anything beyond toy prototypes. The language got dragged down because of it.

That said, it came out around the same time CoffeeScript did, and nobody uses CoffeeScript anymore either. So probably the fate was inevitable regardless of whether the framework was included in the language or not.

AnthonBerg · on Nov 17, 2022

If we want to set sail on the ocean, we do not take a basket and start work to stop it from leaking. We build a boat.

dragonwriter · on Nov 20, 2022

> I think they are making a mistake that’s common in this sort of project: trying too many new things at once!

The one thing identified as “the big idea” of Unison seems to me to be conceived as a solution to a distributed computing problem that incidentally also solves a number of problems that are issues outside of distributed computing (but also within distributed computing, such that fleshing out how it can solve them enhances unison as distributed computing solutions as well as providing side benefits.)

waynesonfire · on Nov 18, 2022

> read their docs to understand why

hard pass. but thanks for the suggestion. I'll start with the home page and the examples and most likely end there.

fwip · on Nov 18, 2022

The reason for the source code structure is because of what it enables for distributed computing.

It makes it simple to transparently ship the code (not just data) around to any worker node. The point of Unison Cloud is to disappear the difference between AWS EC2 and Lambda.