Raph Levien posted a retrospective about using CRDT’s for collaborative editing in xi-editor here [1]. His conclusion is
“I come to the conclusion that the CRDT is not pulling its (considerable) weight. When I think about a future evolution of xi-editor, I see a much brighter future with a simpler, largely synchronous model, that still of course has enough revision tracking to get good results with asynchronous peers like the language server.”
I think that the paragraph that you quoted was in reference to Xi's decision to use CRDTs (and asynchronous communication) for exchanging data between different plugins running in the same instance of Xi. It isn't a comment about the use of CRDTs for collaborative editing, as Zed does.
To put it in a different perspective, plain text editing has well-solved CRDT patterns. But, semantic data-structures like rich-text or syntax trees is what's tricky and has unsolved challenges.
Peritext[1] is the only one that came close to solving rich-text, but even that one left out important aspect of rich-text editing like handling list & table operations as "work to be done later".
> semantic data-structures like rich-text or syntax trees is what's tricky and has unsolved challenges.
Yrs developer here:
- Yjs/Yrs have support for formatting attributes[1]. The matter if format attributes conflict resolution behaves as user desired is a subject to discussion (which is a common thing re. CRDTs and their trade offs), but its behaviour is consistent, convergent and algorithm itself works fast. This feature is in fact used in many rich text editors using Yjs bindings.
- Embedding non-string elements in text is supported as well.
- While syntax trees are not supported out of the box (AFAIK this CRDT is still being researched), Yjs also support XML nodes with attributes and nested children, which may be used for similar effects in some scenarios.
This reminds me of Oracle’s SCN which blew my mind back in ~2009 when I moved up from mysql home gamer stuff to the corporate world.
> Even long edit histories barely compare to the memory savings Zed obtains from not being built with Electron.
I got a good chuckle out of this. Speaking of performance, I think it’s I nteresting that ms word and google docs are still using OT rather than CRDT. The goog version seems to work well but I have had nothing but frustration with ms word. Bad merges and weird states are typical, particularly from the fat client.
> The goog version seems to work well but I have had nothing but frustration with ms word. Bad merges and weird states are typical, particularly from the fat client.
Argh not getting this stuff right is really frustrating. I've been working on collaborative editing for over a decade now, and I still can't implement any of these algorithms correctly without the help of a fuzz testing. But fuzz testing done right finds all of these problems! There's no excuse!
Fuzzers work so well here because all of these algorithms have a clear correctness criteria: After syncing, state should always converge to the same result. So its pretty easy to write code which does this in a loop:
1. Generates some random changes on some fake "peers"
2. Picks 2 peers at random and sync their changes, using your new fancy synchronization algorithm
3. Assert that the state has converged between the peers
I've been working on this stuff for over a decade. I've implemented dozens of these algorithms. And every single time I write a fuzzy boi to check my work I find convergence bugs. Playing whack-a-mole with a fuzzer is a rite of passage for implementing systems like this.
When your fuzzer runs all night, you should never have lingering convergence bugs like you're describing with Word.
Maybe it is time to do some theorem proving for code like this.
Also shows the obvious waste in modern software development: A decade is plenty of time to formally verify a dozen such algorithms. We need to make it easier to do it in a production environment.
Maybe. I honestly don’t know enough about theorum proving.
One nice thing about fuzz testing is that it tests both the concepts and the specific implementation. Probably 70% of bugs my fuzzers have found over the years are edge cases my code is handling incorrectly. If I made a “pure” model of my sync engine, wouldn’t I also, still, need to prove my actual implementation matches that model?
Ideally (part of) your model would already be executable, so that there is no separate implementation. Why have a separate implementation in C, JS, Java, Go, Dart, Swift, Rust, ..., if you have already something much cleaner which you have proven to be correct?
Very cool use case for CRDTs! I've seen a bunch of different use cases from other products like https://liveblocks.io/ and https://electric-sql.com/. It's interesting how CRDTs are now taking hold so much for all these collaborative syncing scenarios. Wonder what's driving the proliferation now given they've been around for awhile?
I've been interested in them since the early days of Atom, but it's just taken a while for me to develop as an engineer and build a system around the theory that puts them to practical use.
Yeah CRDTs are pretty annoying for collaborative text editing, even Google Docs doesn't use them. Some companies have switched from CRDTs back to traditional methods.
The first version of Google Docs was written in 2006; CRDTs weren't proposed until 2011, and still seem to be maturing as a technology.
So, I don't think this is a reflection on the merits of CRDTs versus operational transforms as much as it is a reflection on the ecosystem and the history of the codebase.
Yep. I’ve been working in the collaborative editing space for over a decade - with both OT algorithms and CRDT algorithms. The modern CRDTs I’ve been developing are significantly faster and smaller than any OT system I made 5+ years ago.
Much more complex though. (~3k loc for a good, high performance text crdt merging algorithm, vs 300 loc for a good text OT algorithm.)
Sorry, I'm not sure I properly understood OT as an acronym. Did you mean "operational transformation"? Because I thought that OT was method of making operational based CRDTs[0].
But I'm probably wrong in at least one way! Hoping to learn.
Usually the difference is that operational transform algorithms create an ordered global list of all the changes (typically on a centralized server). They flatten operations using a heuristic "transform" function.
CRDTs don't reorder the changes, but guarantee that when all changes are merged (in any order) the final result would be the same. For example, MAX() is a complete CRDT merging function.
But the distinction gets much more blurry at the edges. You can make OT algorithms which work without a central server, and CRDTs which use operation reordering to guarantee merge consistency.
Could you go into detail a bit more on operation based CRDTs and merge consistency?
I'm currently struggling with moving document merge use-case to Yjs while leveraging it's updates for efficient real-time rebasing. It's insert/delete world view (state based) seems to make this practically impossible.
Quick and dirty: OTs require a centralized controller and a (potentially complicated) merge strategy to shepherd conflicting changes. CRDTs on the other hand apply commutative operations so they don't need be centralized, though nodes that communicate infrequently might get very out of sync
I have always found CRDTs a lot easier to wrap my brain around. Agreed with Joseph that optimizing them is non-trivial. In Teletype for Atom, we indexed fragments in splay trees, and each text fragment was actually a member of two splay trees at the same time: one for the full document, another for the pieces of the original insertion as they'd been split apart. We would find the insertion fragment in the insertion tree, then walk parent pointers in the document tree to locate it. In Zed, we have two trees, but can't do this double embedding because we use persistent data structures where parent pointers are not an option. So we instead have insertion trees that map subslices of inserted text to fragment ordered identifiers, which we use for lookup in the main document B-tree. These fragment identifiers were themselves inspired by another CRDT paper called Logoot, but we don't actually send them over the wire.
Both OT and CRDT approaches have lots of obscure edge cases. In both cases, I can't write a correct algorithm without a fuzzer to save myself. (And I don't trust anyone else to, either).
CRDTs aren't that complex on the surface - this messy file[1] implements 4 different list CRDT algorithms in a few hundred lines of code. And CRDTs are pretty simple for JSON structures, which is not at all true for OT algorithms.
But text CRDTs in particular need a whole lot more thought around optimization because of how locations work. Locations in list/text CRDTs generally work by giving every item a globally-unique ID and referencing that ID with every change. So, inserts work by saying "Insert between ID xxx and yyy". But, if you have a text document which needs to store a GUID for every keystroke which has ever happened, disk and memory usage becomes crazy bad. And if you don't have a clever indexing strategy, looking up IDs will bottleneck your program.
In diamond types (my CRDT), I solve that with a pancake stack of optimizations which all feed into each other. Ids are (agent, sequence number) pairs so I can run-length encode them. Internally those IDs get flattened into integers, and I use a special hand-written b-tree which internally run-length encodes all the items it stores. I've iterated on Yjs's file format to get the file size down to ~0.05 bytes of overhead per inserted character in some real world data sets, which is wild given each inserted character needs to store about 8 different fields. (Insert position, ID (agent + seq), parents, the inserted character, and so on).
CRDTs aren't that complex. Making them small and fast is the challenge. Automerge has been around for years and still takes hundreds of megabytes of ram to process a simple editing trace for a 10 page text document. Even in their rust implementation. Diamond types is orders of magnitude faster, and uses ~1M of RAM for the same test.
The current version of diamond types is many times faster than any OT algorithm I've written in the past thanks to all those optimizations. Now that I've been down this road, I could optimize an OT algorithm in the same way if necessary, but you don't really need to.
Other kinds of CRDTs don't need these optimizations. (Eg the sort you'd use for a normal database). There's two reasons: 1. When editing text, you make an edit with each keystroke. So you have a lot of edits. 2. Merging list items correctly is orders of magnitude more complex than merging a database entry. If you just want CRDTs for editing a database of records, that'd probably only take a few dozen lines.
How would you recommend an interested student to get up-to-speed? There is so much development in so many fields, but often no idea where to even begin to get there.
Given the interest in CRDTs, it'd be a great help if someone wants to do a much more interactive exploration of those algorithms. I feel like there's an interactive guide just waiting to be written which could help people understand this stuff.
I'll offer up that I've had a hard time wrapping my mind around them and that in practice it seems like you need to implement a check pointing operation on top of them which is necessarily not conflict-free, or your replication log will expand without bound & eventually overwhelm your system. (Though perhaps not, depending on your problem domain. Not CRDTs but in this interview the engineers discuss a massive, high frequency replication log that they don't checkpoint, which they've been running at scale for years. Though you could also say they implicitly checkpoint every trading day, and they're working on implementing checkpointing. https://signalsandthreads.com/state-machine-replication-and-...)
That being said I would use CRDTs for any greenfield collaboration project.
One common failure mode is that two people start typing at the beginning of the same line, and rather than getting two lines, it alternates characters. At least, Etherpad did this.
Has anyone used Yjs in practice? I've tried recently but the docs seem terribly unfinished sadly. And their lacking examples of how to use it for another purpose (other than text editing).
Yjs is being quite heavily used in the industry[1], and being researched by even more companies. There are also demos showing how to integrate it with an existing rich text editors[2]. If you have some ideas about the missing parts, you could also open topic on discuss.yjs.dev - the documentation page (https://docs.yjs.dev) has tons of useful links thou.
Re. other purpose projects - Yjs/Yrs main target are sequential data structures (text, arrays), but it also has support for maps and xml-like elements. In general you can build most data structures with it. I agree that it would be nice to have some other applications in demos though.
I remember reading about it in one of Martin Kleppmann’s papers, though I can’t remember which one.
It’s an ordering problem that comes from some of the simpler ordering algorithms. For Diamond types I’m using a variant of Yjs’s ordering. But even RGA doesn’t have this problem because each character’s insert location is specified by naming the character immediately to the left when that character was typed.
This repository implements a few different list CRDTs using an insertion sort approach, where the algorithm scans for the appropriate location every time an insert happens. This is the scanning function for RGA (automerge’s algorithm):
And this is an interactive visualisation of how diamond types works (which uses Yjs’s algorithm instead), complete with run-length encoding: https://home.seph.codes/public/diamond-vis/
This webpage crashed my iPad 6 Gen Safari when repeatedly scrolling down without reading (iPadOS 16.1.1). It’s probably just too little memory, but still, isn’t this just a blog post?
Apologies, looks like we underestimated how heavy the autoplaying diagrams would be. If you could try again in about 5 minutes and see if you still have the problem I'd appreciate it!
(The picture has the name wrong. Maybe intentionally, as some kind of typo which one wants to edit out, hinting at multiple people editing a document?)
Hey, thanks for the heads up. I screwed it up once and duplicated it through everything. Going to fix it. He was too important to insult by misspelling his name.
I'm developing a new type (AFAIK) of database that makes many types of real-time multi-player use-cases really easy to implement. The original idea came from implementing something called "rollback netcode" in multi-player video games, but the technique works for all types of applications.
Traditional databases (e.g. sql databases and key-value stores) treat the database as a blob of data. This makes multi-player updates very difficult, as reconciling differences in real-time is a hard problem, hence the invention of CRDTs. My new database, however, treats the data as a series of events. In the example of a text editor, the stream of events might be something like: User 1 connected, User 1 pressed key a, User 2 connected, User 2 pressed b, User 1 disconnected, etc.
In addition to the stream of events, application developers also provide an "engine" that converts the stream of events into a blob of data. The engine runs locally on each connected client. When a client connects to the database, they fetch the engine from the database server. The client then receives the stream of events from the database server in real-time. When a client generates an event, it gets inserted into the local engine immediately and also gets broadcasted to the server. The server then broadcasts the event to other connected clients.
The most important part that makes this work comes from a technique called rollback netcode. Events in the stream are ordered and undo-able. When a connected client generates an event, they insert it into their local engine immediately, but other clients are also generating events simultaneously. If all clients optimistically inserted events locally, the event order would differ across clients. To solve this, the engine automatically will undo events when new events come in from the server. The engine then applies the received event in its proper place and reapplies all the undone events on top. The result is that all users have a consistent view of the data.
The logic of an engine looks something like this:
1) When User X connected, set X's cursor position to offset 0.
2) When User X types a letter, insert text at User X's cursor offset and increase User X's cursor offset by 1. Also increment the cursor offset of all other connected users if their cursor offset is > User X's.
And that's it for a basic insert-only text editor. You can see that applying all the events in order yields the correct state. Adding a moveable cursor and the ability to remove text is also trivial. Basically just the opposite of 1 and 2 above. The whole engine can be implemented in like 100 lines of code and is very easy to reason about.
Our Aper (https://aper.dev) implements a number of similar concepts (state machine replication with optimistic local transitions + rollback). I 100% agree that it’s an easier model to reason about.
Your approach with cursors is clever, that part I haven’t seen elsewhere.
That looks cool. Thanks for sharing. Rust's enums and match syntax make the API look pretty slick. What types of applications are you seeing people use Aper for?
I wonder what happens when an edit can not be delivered to 1+ participant(s). Does an edit need aknowledgements from _all_ participants to be persisted? Is it just dropped?
In Zed/CRDTs, these two seem synonymous to me since the OP's article is the first I read about CRDTs that I grasped, or so I thought ;)
Doesn't the whole process assume that whenever any edit is done (insertion/deleteion/un-/redo) that the edit _eventually_ reaches all participants?
So if a single edit is believed to be delivered to all, but actually never made it to a single participant, who resends it? And if it's not resent, then the participants state is inconsistent from now on, no?
So, client that was "offline" can just ask a participant to get the edits from his own last one (aka pull), send out bis own offline edits (push) and resolve from there, I See.
As a Windows user, I'd be pretty concerned about the multiplatform story here. They say that this is MacOS only for now but they intend to eventually ship a Windows App, Linux App and web app.
It just sounds like a recipe for a Sketch situation where it's really just a Mac app and web (and other platforms) is a second class citizen.
We’ve gone to great lengths to abstract out the platform dependencies. Adding a platform right now doesn’t teach us enough for it to be worth it just yet, but we’re seriously planning for it.
“I come to the conclusion that the CRDT is not pulling its (considerable) weight. When I think about a future evolution of xi-editor, I see a much brighter future with a simpler, largely synchronous model, that still of course has enough revision tracking to get good results with asynchronous peers like the language server.”
[1]https://github.com/xi-editor/xi-editor/issues/1187#issuecomm...