well if you use k8s you need to have some spare capacity for updates, etc. consi...

puzzle · on May 28, 2018

If you're running Kubernetes, you probably have multiple services running (or it's not really worth the complexity). Then whatever slack you have in the cluster can be amortized over all your services, if they all share the same resources (quota). It's also a good idea not to update too many deployments all at the same time.

Even then, deployment updates don't necessarily need to surge above their replica count. You can also configure them to terminate X replicas at a time before bringing up new ones. At Google, all teams have Borg quotas, so it's not unusual to max those up by running as many replicas as possible. During updates, Borg does not allow an user to temporarily oversubscribe their quota (unless you're changing replica count and replica footprint at the same time, but that's another fun story), so it will always take down Y tasks first.