> The tricky part is that when multiple users open the same Fig file concurrently, Figma’s infrastructure needs to ensure that they are all connected to the same server. That server can then be the sole authority on the state of that document, and write to it without race conditions.
This is the killer app for CloudFlare's new Durable Objects. They solve the routing and data distribution layer, allowing you to write single-threaded business logic to coordinate changes.
They even have a transactional storage capability which is priced more or less equivalent to DynamoDB (which Figma uses for their write-ahead log).
I try to boost Durable Objects every chance I get, in order to "push the battleship" in the direction of another cloud provider implementing something equivalent.
While this article is written by the plane.dev team, which has an adjacent product, their approach seems more geared towards more demanding backends. Lots of use cases don't need to run e.g. a game or simulation on the backend, they just need a synchronization layer in front of a document store.
---
> They could stuff the whole document into a binary blob or json(b) column next to the metadata.
In my experience doing this in MySQL... do not do this. Once you have a dozens-if-not-hundreds-of-gigabytes large table full of JSON, it becomes a real nuisance. As a halfway measure, I think it would help to have a "blobs only" table separate to the metadata table. But as the OP points out, it is not economical anyway.
I'm an engineer on Atlassian's new whiteboard feature for Confluence, which has real-time collaboration like Figma. We've been highly successful using Cloudflare Durable Objects to act as a simple relay for WebSocket messages.
One of the best things is that the Durable Object will automatically run in the closest Cloudflare data center to the first user who connects, which helps keep latency very low.
The alarms feature they added last year has also been great at allowing us to asynchronously snapshot the state of a board certain intervals, rather than us needing to schedule the job from an external system.
One of the benefits of our new storage engine[0] is that it'll be much easier for us to host it in any datacenter, rather than just the biggest, best-connected ones. We still have a lot of work to do to make this available to all durable objects and actually start utilizing smaller datacenters this way, but we're working on it.
(Author here) you’re absolutely right, Durable Objects are a great product, especially for the “just need a sync layer” use case. We are building the persistence layer mentioned at the end of the article to run on either Durable Objects or (as a regular Linux process) on Plane. https://driftdb.com/
I do see Plane as being relevant to pure sync layers, for cases where you want to run on your own cloud/own metal, or can’t compile your code to run on V8, but it’s good to have options.
Great points, I really hope Plane finds success. I'd forgotten about DriftDB, that's also a very cool spot in the landscape. More diversity and experiments in this space are great.
(I lead DO & databases product here at Cloudflare)
Thanks for the kind words! Durable Objects also underpins a tremendous amount of what we build internally as well — it’s fundamentally a very powerful “coordination API”.
FYI: We’re continuing to work on observability & throughput improvements so folks can get more of each DO, on top of the horizontal sharding (“a DO per X”) approach we recommend.
I've been thinking about this. Is this honestly the only solution? When time comes to edit, the artifact needs to be on a single server? Obviously you can redundantly write it to a cluster, but even in a cluster there's a "leader" typically in most architectures.
Well you could do your entire data-model as an CRDT (even with single-server some semblance to an CRDT model can be beneficial but not as needed).
The question then becomes:
- Do you want a complicated data model based on CRDT's that will allow for arbitrary node distribution and failures (That you kinda need to handle anyhow)
- Do you select a "simpler" datamodel where you have single master servers (per-document) but spend a tad more effort on resolving node crashes (and are hopefully infrequent?)
Theoretically CRDT's are prettier, and infra is always scary. But otoh do you want to spend a lot of time on modelling complicated data-models for features that might not even make sense in the long run? (Ie, do you 100% know your product before starting.. or will you need to iterate and update it).
CRDT (or similar solutions like OT) is important if you ever want decent offline support and true real-time collaboration. Even a user with a spotty connection is essentially "occasionally offline". Having a single coordinator "in the Cloud" doesn't really solve the concurrency issues.
However you also want a single coordinator. It will give you the best performance. If you want to have people in a call to be able to move their cursors around and see each other within a second it will be very hard to do with sending everything through a database and polling or similar. But this can fundamentally be seen as an optimization. The client could just push writes and pull new changes to the database every couple of seconds. But this is both less efficient (the client needs to retain more history and do more complex merges) and higher latency.
Another solution is to design your data model so that it's fully peer-to-peer , then you don't need a point of centralisation. Except for realtime communication/updates, but you have a lot more leeway.
CRDTs are an obvious way to do this. Or maybe something like CouchDB's document model?
You can use a scalable database that gives you serializable transactions whilst not requiring you to represent your document in the relational model.
A good example of this architecture would be using FoundationDB [1] with Permazen [2]. In this design there are three layers:
1. A horizontally scaling sorted transactional K/V store. This is provided by FoundationDB. Transactions are automatically ordered within the cluster.
2. A network protocol that can serialize database transactions. Permazen has the ability to do this, you can do things like cache reads, HTTP POST transactions and so on. Stuff you can't easily do with SQL databases.
3. A way to map in-memory objects to/from key/value pairs, with schema migration, indexing and other DB-like features. Permazen also does this.
Permazen can be thought of as an ORM for KV stores. It's a library intended to execute on trusted servers (because the KV store can't do any business logic validation). However, for something like Figma where it's basically trusting the client anyway that doesn't matter. Additionally you can do some tricks with the architecture to support untrusted clients; I've explored these topics with Archie (the Permazen designer) in the past.
The nice thing about this design is that it doesn't require sharding by "file", can scale to large numbers of simultaneous co-authors, and results in a very natural coding model. However, Permazen is a Java library. To use it from a browser would be awkward. That said it has fairly minimal reliance on the JDK. You could probably auto-convert it to Kotlin and then use Kotlin/JS or Kotlin/WASM. But frankly it'd be easier to do that architecture as a real desktop app where you aren't boxed in by the browser's limitations. And of course the design ideas can be implemented in any language, it's just a lot of work.
The writeup mentions a couple of reasons for not using a database:
2. Cost of a database vs S3. This is mostly an artifact of cloud pricing. Cloud is highly profitable but most of the margin comes from managed databases and other very high level services, not commodity byte storage. Given that FDB is free you could eliminate the cost gap by just running the database yourself, and especially, running it on your own metal.
Because Permazen has a pluggable KV backend and because there are backends that write to files, you can have both worlds - a scalable DB in the cloud and also write to files for individual cases where people don't want to store data on your backend.
This is the killer app for CloudFlare's new Durable Objects. They solve the routing and data distribution layer, allowing you to write single-threaded business logic to coordinate changes.
They even have a transactional storage capability which is priced more or less equivalent to DynamoDB (which Figma uses for their write-ahead log).
This pattern also helps scale write transactions without melting your database. http://ithare.com/scaling-stateful-objects/
I try to boost Durable Objects every chance I get, in order to "push the battleship" in the direction of another cloud provider implementing something equivalent.
While this article is written by the plane.dev team, which has an adjacent product, their approach seems more geared towards more demanding backends. Lots of use cases don't need to run e.g. a game or simulation on the backend, they just need a synchronization layer in front of a document store.
---
> They could stuff the whole document into a binary blob or json(b) column next to the metadata.
In my experience doing this in MySQL... do not do this. Once you have a dozens-if-not-hundreds-of-gigabytes large table full of JSON, it becomes a real nuisance. As a halfway measure, I think it would help to have a "blobs only" table separate to the metadata table. But as the OP points out, it is not economical anyway.