More

samokhvalov · 2025-06-03T03:00:14 1748919614

1) built using an open source kubernetes operator, as I understand 2) Crunchy provides true superuser access and access to physical backups – that's huge

anonymousDan · 2025-06-03T04:22:00 1748924520

Why is that huge out of interest?

CBLT · 2025-06-03T04:27:31 1748924851

Business continuity. If you don't have access to your backups, there's nothing you can do to work around a vendor issue.

apexalpha · 2025-06-03T06:18:22 1748931502

Sounds like Stackgres?

samokhvalov · 2025-05-18T12:33:01 1747571581

> we deploy the Postgres instances on Kubernetes via the CloudNativePG operator.

I'm curious if split brain cases already experienced. At scale, it should be so https://github.com/cloudnative-pg/cloudnative-pg/issues/7407

conradludgate · 2025-05-18T15:36:45 1747582605

disclaimer, employee at Neon, another postgres hosting provider

My understanding after looking into it, it seems that Xata+SimplyBlock is expected to use ReadWriteOnce persistent volume access modes. This means the claim can only be bound to one node.

I think this solves the split-brain problem because any new postgres readwrite pods on new nodes will fail to bind the volume claim, but it means there's no high-availability possible in the event the node fails. At least, I think that's how kubernetes handles it - I couldn't find too much explaining the failure modes of persistent volumes, but I don't see many other solutions.

At Neon, we solve this issue by having our storage nodes form a consensus protocol with the postgres node. If a new postgres node comes online, they will both contend for multi-paxos leadership. I assume the loser will crash-backoff to reset the in-memory tables so there's no inconsistency if it tries to reclaim the leadership again and wins. In the normal mode with no split-brain and one leader, multi-paxos has low overhead for WAL committing.

yencabulator · 2025-05-18T22:05:11 1747605911

How on earth is reporting a broken consistency promise a "discussion". That is dubious behavior from the project management.

samokhvalov · 2025-05-19T03:24:12 1747625052

I also don't see any good reasons for this.

samokhvalov · on Oct 7, 2024

Thanks for mentioning!

fforflo · on Oct 8, 2024

Looked nice so I automatically did git clone && cd postgres_dba && make all install installcheck... and failed :D

Creatures of habit... any plans to make this an extension?

gnfargbl · on Oct 8, 2024

Thanks for the tool! It has helped me out of the mire more than once.

samokhvalov · on Sept 16, 2024

have you had minor and major upgrades? with really 0 downtime?

samokhvalov · on July 5, 2024

there is also a problem of data locality and blocks present in caches (page cache, buffer pool) at any given time, in general -- UUIDv4 is losing to bigint and UUIDv7 in this area

shi · on July 6, 2024

This is a valid point but it highly depends on the use case and larger context on whether this will be relevant or not. If you have a table where you fetch multiple rows close to each other in paginated manner this would be relevant for the performance but if you only fetch individual records by uuid, data locality wouldn’t increase the performance.

samokhvalov · on July 5, 2024

some related stuff:

- https://commitfest.postgresql.org/48/4388/ (original patch created live https://www.youtube.com/watch?v=YPq_hiOE-N8)

- https://postgres.fm/episodes/uuid

- https://postgres.fm/episodes/partitioning-by-ulid

- https://gitlab.com/postgres-ai/postgresql-consulting/postgre...

samokhvalov · on March 25, 2024

Yes

But why? Patroni is great for HA and it doesn't require k8s.

rad_gruchalski · on March 25, 2024

Because there's so much to properly running it. Individual component updates, network policies, certificates, backups, monitoring, authorization, authentication, secret management. Kubernetes makes it all relatively easy and k3s (or k0s) ain't that scary.

K3s, Calico (or whatever you prefer), Istio with cert-manager and self-manager CA (or whatever you prefer for your service mesh), kube-prometheus, OTEL or Jaeger for the mesh visibility, pick the operator (I used Crunchy Data operator but there are at least two other solid choices), will get one far at low cost. Of course, use reliable infrastructure provider of one's choice.

No need to think about placement (one still can if they want to), addressing, firewalling, DNS, IP assignment, and so on. Add nodes to the cluster as necessary and let it sort itself out.

Some understanding of Kubernetes is necessary, indeed. But it's a stack usable every time once learned.

samokhvalov · on March 25, 2024

Clusters with HA, backups and so on

samokhvalov · on March 25, 2024

Why, indeed?

samokhvalov · on March 24, 2024

~~

Context: https://twitter.com/samokhvalov/status/1771573110858269014

~1000 votes in just one day – obviously, this is an attractive topic to discuss, so wanted to have a thoughtful conversation here on HN.