The first engineer on the Delos team here. This design decision gave us an edge: we delivered the first production cluster in 8 months by layering over existing systems ; and we smoothly migrated all underlying implementations in the following 2 years without our customer aware of it (0 downtime). This design decision is well captured by the OSDI’20 Virtual Consensus in Delos paper.
Very interesting. How did you deal with the issue where the old API had some feature X that wasn't supported in the new API but some customers depended on feature X?
I guess that a hard question for user facing API: you need to support both the old and the new features during the migration.
In our case, this design principle mostly applied to our internal APIs. Particularly, we designed our internal sub components (consensus protocol, particularly) in a way we can swap it without any downtime.
Right. So it seems you did a lift and shift over two years without changing the basic API surface. That’s ok, but stability of requirements throughout a rearchitecting project is a luxury and not the norm