I think they are making a mistake that's common in this sort of project: trying too many new things at once!
They already have a very innovative way of managing source code, with a database of definitions that keeps the hash of the syntax tree instead of actual source. That's a very neat idea that solves many problems (read their docs to understand why).
But instead of developing that well enough so that it works with source control tools, IDEs, can be deployed easily and painlessly on existing infrastructure... no, they decided to ALSO solve distributed computing, a really, really complex space with a pretty crowded space of solutions... and seem to be focusing on that now instead of the "original" ideas. Looks like a huge issue with scope creep to me... unless they are kind of pivoting to distributed computing now only because the original ideas were not attractive enough for people to embrace it, but I have not heard of anything like that, everyone seems to be pretty vibed by those things.
The two areas of managing source code and distributed computing are not as disjunct as you make them in the context of Unison. They follow from the underlying principle of addressing functions not by their name but by a hash of their normalized syntax tree (ie their code).
There are a bunch of cool implications for distributed computing, namely that you can easily distribute fine grained parts of your application across servers and that you can cache the result of expensive calculations.
> They follow from the underlying principle of addressing functions not by their name but by a hash of their normalized syntax tree (ie their code).
The first time the language or compiler changes such that the same code generates a different syntax tree they'd have to do something pretty fancy to avoid rebuilding the world. (That, plus all the usual caveats about what happens when old hash algorithms meet malicious actors from the future.)
(Unison Developer here), yes we've done two such migrations this year. It has typically meant that we have an "are you ready to upgrade" message when you start up and the migration took less than a minute on my pretty large codebase. It's not a big deal
It's a solved problem technically, certainly, but assuming all of the relevant source code will always be available ignores some social and legal issues.
One issue I was thinking of was companies that distribute code in binary form only, not ASTs or anything which could be used to steal their thoughts. The other issue was in reverse, however: A binary-only package is unmaintained and lists as dependencies hashes that no longer exist because they're the hashed versions of ASTs that, for one reason or another, the compiler won't generate, even if the source still exists. Versioning and archiving would help this case, at least.
It's a common approach in language design to require this though. Rust has made very similar design choices due to its (current) lack of stable ABIs. IIRC some hash of the compiler version is included in built libraries and prevents accidental linking. You need to rebuild the world for every toolchain upgrade.
Also if you change one very low level function (maybe something in the runtime, Unicode handling etc.) you'd also have to recompile the world. In some ways it's nice to reference things by a name, and let the implementation change without needing to care about the details. It's semver's raison d'etre
To be fair, for a while they were ALSO working on their own graphical source editor that allowed for type-correct transformations and assisted refactorings. They put that on the back burner specifically because they are trying to focus on fewer things :)
I think the distributed computing problem is pretty related once you have "content-addressable" source code. Agreed that it's a lot of work but I hope it pans out!
Well, there is a relationship. The relationship is specifically that Unison nodes can communicate code with one another unambiguously by exchanging hashes.
(I'm the a Unison employee that works mostly on distributed computing)
There are a few things about unison that make this easier:
* content addressed code
* unison can serialize any closure
I can ask for a serialized version of any closure, and get back something that is portable to another runtime. So in a function, I can create a lambda that closes over some local variables in my function, ask the runtime for the code for this closure and send it to a remote node for execution. It won't be a serialized version of my entire program and all of its dependencies, it will be a hash of a tree, which is a tree of other hashes.
The remote node can inspect the tree and ask for or gossip to find out the definitions for whatever hashes in that tree it doesn't already know about. It can inspect the node for any forbidden hashes (for example, you can'd do arbitrary IO), then the remote node can evaluate the closure, and return a result.
In other langauges, perhaps you can dynmically ship code around to be dynamically exeecuted, but you aren't going to also get "and of course ship all the transitive dependencies of this closure as needed" as easily as we are able to.
In a simple Lisp machine, such as something resembling PicoLisp, I can't see why not - iff instead of car and cdr being just linear addresses in a memory, have them itself be hashes.
Since everything is made up from car and cdr, it's easy going from there. There is no difference between running locally, or anywhere. Just look up the data by request or gossip, as you said.
(For performance reasons, one might want to let a cons which represents "source" smaller than the size of a hash, be a literal representation instead of hashed. No need to do a lookup from a hash when you can have the code/data in the cons itself. Analoguous, don't zip a file when the resulting zip would be larger.)
In this pretend program we are using some 3rd party s3 library to fetch some data, some other 3rd party CSV library to extract the data. Now I want to run this on some other remote node. I know I can just sent that sexp to the remote node and have it eval it, but that is only going to work if the right versions of the S3 and CSV library are already in that runtime.
I want to be able to write a function like:
(defun remote-run (prog)
(....))
That can take ANY program and ship it off to some remote node and have that remote node calculate the result and ship the answer back. I don't know of some way in lisp you could ask the runtime to give you a program which includes your calculation and the s3 functions and the csv functions.
In unison, I can just say `(seialize my-program)` and get back a byte array which represents exactly the functions we need to evaluate this closure. When the remote site tries to load those bytes into their runtime, it will either succeed or fail with "more information needed" with a list of additional hashes we need code for, and the two runtimes can recurse through this until the remote side has everything needed to load that value into the runtime so that it can be evaluated to a result.
Then, of course, this opens us up to the ability for us to be smart in places and say "before you try running this, check the cache to see if anyone ran this closure recently and already knows the answer"
In lisp there are symbols and lists. The symbols could be pointers to content addressable hashes, or in-lined content. So in the namespace you could require the library by its content addressable hash, and give it a symbol definition, or just refer to the hash itself.
I'm not saying there exists a Lisp with a turn-key distributed cloud runtime. Like the sibling answer, I'm saying, it's not super complicated. Instead of loading an .so file the regular way, load it via hash.
The car/cdr nature of Lisp makes it almost uniquely suited to distributed runtime IMHO.
As for your last sentence, this touches on compilation. Solve this, and you solve also dependency detection and compilation.
I feel there are so many things which could/should converge at some point in the future. IDE / version control, distributed compute, storage.
An AST (or source) program could have a hash, which is corresponding in a cache to a compiled or JIT-ed version of that compilation unit, and various eval results of that unit etc etc. So many things collapse into one once your started to treat everything as key/value.
Well, it could be done in JS or Lisp. You'd have to replace all the references to dependencies in every function with a hash of the implementation of the referenced function and use some kind of global lookup table (which would need to be a distributed hash table or something). But this would be slow, so you'd need a compiler or some kind of processor to inline stuff and do CPS transforms, and by that time you've basically implemented a janky version of Unison.
Or a beautiful, conceptually simple version of Unison. In any case, not more convoluted than for instance Javascript V8 already is to get the performance it has.
An issue with exchanging WASM is you'll have to do either dynamic or static linking of dependencies. Unison works around this by having all definitions share a global address space (and the address of any given code in that space is the hash of its syntax tree), so there is no linking step. Or rather, the linking is done trivially by the hashing.
> Creating yet another language ensure that this will get nowhere, too bad.
What has happened before, consistently, is that research or proof of concept- style languages pave the way for bigger players to take the ideas and incorporate them into existing or future mainstream languages.
I thought that was sort of Paul's goal, to explore a bunch of new ideas and paradigms. I doubt he's under any illusions that this language, qua this language, is going to see wide adoption or need to have all of its features really ironed out with a fixed/stable API. But I could be wrong!
Agreed, I think it's great to have really cutting-edge, highly experimental sort of research languages that explore new paradigms. Doesn't mean you have to like the language, but I think it's good that some people are doing this kind of work as a going concern and not just some repo that was pushed up once somewhere and languishes. He's got some interesting ideas in there related to effects system, etc.
> I doubt he's under any illusions that this language, qua this language, is going to see wide adoption or need to have all of its features really ironed out with a fixed/stable API. But I could be wrong!
They have VC funding and employees. I presume he's told investors that it will see at least fairly wide adoption!
Ha! Did not know that. Not necessarily dispositive though. Also, I believe they are a public benefit corporation so that would also cut in favor of my point...I think
You have it backwards. The source code management (as a marketable feature) came at least one year and maybe two years after the distributed computing parts. The earliest Unison demos (circa 2016) were "build a distributed google crawler in just a few lines of code". I think the hashes were always a component, but not part of the real messaging for quite awhile.
Sounds like `Opa!`, which was a language plus a meteor-like framework. The language itself was really nice with some interesting type-level features, but the built-in framework was the selling point. But all that ultimately got in the way for doing anything that wasn't built into it, which ended up being pretty much anything beyond toy prototypes. The language got dragged down because of it.
That said, it came out around the same time CoffeeScript did, and nobody uses CoffeeScript anymore either. So probably the fate was inevitable regardless of whether the framework was included in the language or not.
> I think they are making a mistake that’s common in this sort of project: trying too many new things at once!
The one thing identified as “the big idea” of Unison seems to me to be conceived as a solution to a distributed computing problem that incidentally also solves a number of problems that are issues outside of distributed computing (but also within distributed computing, such that fleshing out how it can solve them enhances unison as distributed computing solutions as well as providing side benefits.)
The reason for the source code structure is because of what it enables for distributed computing.
It makes it simple to transparently ship the code (not just data) around to any worker node. The point of Unison Cloud is to disappear the difference between AWS EC2 and Lambda.
They already have a very innovative way of managing source code, with a database of definitions that keeps the hash of the syntax tree instead of actual source. That's a very neat idea that solves many problems (read their docs to understand why).
But instead of developing that well enough so that it works with source control tools, IDEs, can be deployed easily and painlessly on existing infrastructure... no, they decided to ALSO solve distributed computing, a really, really complex space with a pretty crowded space of solutions... and seem to be focusing on that now instead of the "original" ideas. Looks like a huge issue with scope creep to me... unless they are kind of pivoting to distributed computing now only because the original ideas were not attractive enough for people to embrace it, but I have not heard of anything like that, everyone seems to be pretty vibed by those things.