> The tricky part is that when multiple users open the same Fig file concurrentl...

calebj0seph · on July 13, 2023

I'm an engineer on Atlassian's new whiteboard feature for Confluence, which has real-time collaboration like Figma. We've been highly successful using Cloudflare Durable Objects to act as a simple relay for WebSocket messages.

One of the best things is that the Durable Object will automatically run in the closest Cloudflare data center to the first user who connects, which helps keep latency very low.

The alarms feature they added last year has also been great at allowing us to asynchronously snapshot the state of a board certain intervals, rather than us needing to schedule the job from an external system.

crabmusket · on July 13, 2023

Side note: unfortunately for Aussies the nearest DC running durable objects is still in Singapore :(

Source: https://where.durableobjects.live/

kentonv · on July 13, 2023

One of the benefits of our new storage engine[0] is that it'll be much easier for us to host it in any datacenter, rather than just the biggest, best-connected ones. We still have a lot of work to do to make this available to all durable objects and actually start utilizing smaller datacenters this way, but we're working on it.

[0] https://twitter.com/KentonVarda/status/1659551757796515846

(I'm tech lead for Workers and have been focused on this storage engine in particular.)

crabmusket · on July 13, 2023

Great to hear! Your work is really impressve.

calebj0seph · on July 13, 2023

> Currently, durable objects are available in 9.89% of Cloudflare PoPs.

I really hope this becomes higher, the less latency the more compelling Durable Objects become for real-time applications.

paulddraper · on July 13, 2023

Better than California by a long shot

paulgb · on July 13, 2023

(Author here) you’re absolutely right, Durable Objects are a great product, especially for the “just need a sync layer” use case. We are building the persistence layer mentioned at the end of the article to run on either Durable Objects or (as a regular Linux process) on Plane. https://driftdb.com/

I do see Plane as being relevant to pure sync layers, for cases where you want to run on your own cloud/own metal, or can’t compile your code to run on V8, but it’s good to have options.

crabmusket · on July 13, 2023

Great points, I really hope Plane finds success. I'd forgotten about DriftDB, that's also a very cool spot in the landscape. More diversity and experiments in this space are great.

elithrar · on July 13, 2023

(I lead DO & databases product here at Cloudflare)

Thanks for the kind words! Durable Objects also underpins a tremendous amount of what we build internally as well — it’s fundamentally a very powerful “coordination API”.

FYI: We’re continuing to work on observability & throughput improvements so folks can get more of each DO, on top of the horizontal sharding (“a DO per X”) approach we recommend.

crabmusket · on July 13, 2023

Excited for the observability improvements. You're all doing great work!

3cats-in-a-coat · on July 13, 2023

I've been thinking about this. Is this honestly the only solution? When time comes to edit, the artifact needs to be on a single server? Obviously you can redundantly write it to a cluster, but even in a cluster there's a "leader" typically in most architectures.

Any alternatives?

whizzter · on July 13, 2023

Well you could do your entire data-model as an CRDT (even with single-server some semblance to an CRDT model can be beneficial but not as needed).

The question then becomes:

- Do you want a complicated data model based on CRDT's that will allow for arbitrary node distribution and failures (That you kinda need to handle anyhow)

- Do you select a "simpler" datamodel where you have single master servers (per-document) but spend a tad more effort on resolving node crashes (and are hopefully infrequent?)

Theoretically CRDT's are prettier, and infra is always scary. But otoh do you want to spend a lot of time on modelling complicated data-models for features that might not even make sense in the long run? (Ie, do you 100% know your product before starting.. or will you need to iterate and update it).

kevincox · on July 13, 2023

I think you want both.

CRDT (or similar solutions like OT) is important if you ever want decent offline support and true real-time collaboration. Even a user with a spotty connection is essentially "occasionally offline". Having a single coordinator "in the Cloud" doesn't really solve the concurrency issues.

However you also want a single coordinator. It will give you the best performance. If you want to have people in a call to be able to move their cursors around and see each other within a second it will be very hard to do with sending everything through a database and polling or similar. But this can fundamentally be seen as an optimization. The client could just push writes and pull new changes to the database every couple of seconds. But this is both less efficient (the client needs to retain more history and do more complex merges) and higher latency.

crabmusket · on July 13, 2023

Another solution is to design your data model so that it's fully peer-to-peer , then you don't need a point of centralisation. Except for realtime communication/updates, but you have a lot more leeway.

CRDTs are an obvious way to do this. Or maybe something like CouchDB's document model?

mike_hearn · on July 13, 2023

You can use a scalable database that gives you serializable transactions whilst not requiring you to represent your document in the relational model.

A good example of this architecture would be using FoundationDB [1] with Permazen [2]. In this design there are three layers:

1. A horizontally scaling sorted transactional K/V store. This is provided by FoundationDB. Transactions are automatically ordered within the cluster.

2. A network protocol that can serialize database transactions. Permazen has the ability to do this, you can do things like cache reads, HTTP POST transactions and so on. Stuff you can't easily do with SQL databases.

3. A way to map in-memory objects to/from key/value pairs, with schema migration, indexing and other DB-like features. Permazen also does this.

Permazen can be thought of as an ORM for KV stores. It's a library intended to execute on trusted servers (because the KV store can't do any business logic validation). However, for something like Figma where it's basically trusting the client anyway that doesn't matter. Additionally you can do some tricks with the architecture to support untrusted clients; I've explored these topics with Archie (the Permazen designer) in the past.

The nice thing about this design is that it doesn't require sharding by "file", can scale to large numbers of simultaneous co-authors, and results in a very natural coding model. However, Permazen is a Java library. To use it from a browser would be awkward. That said it has fairly minimal reliance on the JDK. You could probably auto-convert it to Kotlin and then use Kotlin/JS or Kotlin/WASM. But frankly it'd be easier to do that architecture as a real desktop app where you aren't boxed in by the browser's limitations. And of course the design ideas can be implemented in any language, it's just a lot of work.

The writeup mentions a couple of reasons for not using a database:

1. Relational/object mismatch. Permazen+FDB solves this.

2. Cost of a database vs S3. This is mostly an artifact of cloud pricing. Cloud is highly profitable but most of the margin comes from managed databases and other very high level services, not commodity byte storage. Given that FDB is free you could eliminate the cost gap by just running the database yourself, and especially, running it on your own metal.

Because Permazen has a pluggable KV backend and because there are backends that write to files, you can have both worlds - a scalable DB in the cloud and also write to files for individual cases where people don't want to store data on your backend.

https://www.foundationdb.org/ [1]

https://permazen.io/ [2]