Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Doing this at anything > 1k nodes is a pain in the butt. We decided to run many <100 nodes clusters rather than a few big ones.




Same here. Non Kubernetes project originated control plane components start failing beyond a certain limit - your ingress controllers, service meshes etc. So I don't usually take node numbers from these benchmarks seriously for our kind of workloads. We run a bunch of sub-1k node clusters.

Same. The control plane and various controllers just aren't up to the task.

Meh, I've had had clusters with close to 1k nodes (w/ cilium as CNI) and didnt have major issues

When I was involved about a year ago, cilium falls apart at around a few thousand nodes.

One of the main issues of cilium is that the bpf maps scale with the number of nodes/pods in the cluster, so you get exponential memory growth as you add more nodes with the cilium agent on them. https://docs.cilium.io/en/stable/operations/performance/scal...


Thats true and I definitely had to "tune" the bpf map limits, but it wasn't really that difficult to do.

Wouldn't that be quadratic rather than exponential?



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: