1) built using an open source kubernetes operator, as I understand
2) Crunchy provides true superuser access and access to physical backups – that's huge
disclaimer, employee at Neon, another postgres hosting provider
My understanding after looking into it, it seems that Xata+SimplyBlock is expected to use ReadWriteOnce persistent volume access modes. This means the claim can only be bound to one node.
I think this solves the split-brain problem because any new postgres readwrite pods on new nodes will fail to bind the volume claim, but it means there's no high-availability possible in the event the node fails. At least, I think that's how kubernetes handles it - I couldn't find too much explaining the failure modes of persistent volumes, but I don't see many other solutions.
At Neon, we solve this issue by having our storage nodes form a consensus protocol with the postgres node. If a new postgres node comes online, they will both contend for multi-paxos leadership. I assume the loser will crash-backoff to reset the in-memory tables so there's no inconsistency if it tries to reclaim the leadership again and wins. In the normal mode with no split-brain and one leader, multi-paxos has low overhead for WAL committing.
there is also a problem of data locality and blocks present in caches (page cache, buffer pool) at any given time, in general -- UUIDv4 is losing to bigint and UUIDv7 in this area
This is a valid point but it highly depends on the use case and larger context on whether this will be relevant or not. If you have a table where you fetch multiple rows close to each other in paginated manner this would be relevant for the performance but if you only fetch individual records by uuid, data locality wouldn’t increase the performance.
Because there's so much to properly running it. Individual component updates, network policies, certificates, backups, monitoring, authorization, authentication, secret management. Kubernetes makes it all relatively easy and k3s (or k0s) ain't that scary.
K3s, Calico (or whatever you prefer), Istio with cert-manager and self-manager CA (or whatever you prefer for your service mesh), kube-prometheus, OTEL or Jaeger for the mesh visibility, pick the operator (I used Crunchy Data operator but there are at least two other solid choices), will get one far at low cost. Of course, use reliable infrastructure provider of one's choice.
No need to think about placement (one still can if they want to), addressing, firewalling, DNS, IP assignment, and so on. Add nodes to the cluster as necessary and let it sort itself out.
Some understanding of Kubernetes is necessary, indeed. But it's a stack usable every time once learned.