I've been consulting on EKS/GKE cost optimization for a few mid-sized companies ...

cogman10 · 2025-12-13T19:38:45 1765654725

Actually, this sounds like your java devs have misconfigured containers.

Java will happily eat all the memory you throw at it. The fact that it isn't means they are probably relying on the default JVM settings. Those are too conservative inside a container, especially if they are running on older JVM versions.

What you'll find is the JVM will constantly have "waste" due to it's default configuration. The question is if it ends up throwing out of memory exceptions due to the lower memory limit.

If your devs haven't looked at the heap allocation of a running application, there's no way to know if this is too much or too little memory.

Go/python much more readily give memory back to the OS.

wozzio · 2025-12-13T20:49:36 1765658976

The JVM expands to fill the container, but the scheduler still counts that 8GB request as used when packing the node. Even if the app only needs 2GB of working set, we are blocked from scheduling other pods on that wasted 6GB buffer.

cogman10 · 2025-12-13T22:40:58 1765665658

No, it doesn't. Believe me, I'm an expert this.

The JVM determines the maximum allocation it will take at startup. It will not grow beyond what that determination is. Effectively it sets the `XMX` setting once whether explicitly or implicitly from various memory settings.

The JVM, without explicit configuration, is also reluctant to give back memory to the OS. Most of the collectors will keep whatever memory they have requested.

And now that you've posted 8GB and 2GB, that pretty much confirms to me that you are both running older JVMs and using the default settings. For older JVMs the default was to take 25% of the available memory without further configuration.

Here's an article describing exactly this problem [1].

Your devs likely ran into OOME problems in production and merely increased their pod request. Cutting it down to "save" memory is a bad move, you need to do that in tandom with having the devs correctly set the JVM settings in their container. Otherwise, you reducing it to 2gb will cause the app to run with 512MB max heap size (almost certainly causing problems).

You may have seen a pod exceed that 2GB. That is simply because there are a lot of situations where the JVM can do "off heap" allocations, application dependent. The 2GB max is for the heap and not all the off heap allocations.

For Java, off heap allocations are somewhat rare. But you should know about them just so you don't set the JVM heap to 100% the pod memory. You need to leave a buffer big enough to accommodate those off heap scenarios (including garbage collections). For smaller heaps (<1gb) 60% is probably a good target. For larger heaps (10+gb) 80 or even 90% is a pretty good target.

[1] https://www.baeldung.com/java-docker-jvm-heap-size

dboreham · 2025-12-13T19:11:36 1765653096

Typically they're overprovisioning through prior experience. In the past something fell over because it didn't have enough memory. So they gave it more. That practice stuck in the brain-model. Perhaps it's no longer valid, but who wants to bring down the service doing Chernobyl experiments?

PaulKeeble · 2025-12-13T19:45:41 1765655141

You run these tools and you find all the maximums for weeks of traffic and so you set them down to minimise cost and all is well until the event. The event doesn't really matter, it causes an increase in traffic processing time and suddenly every service needs more memory to hold all transactions, now instead they fail with out of memory and disappear and suddenly all your pods are in restart loops unable to cope and you have an outage.

The company wasting 20% extra memory on the other hand is still selling and copes with the slower transaction speed just fine.

Not sure over provisioning memory is really just waste when we have dynamic memory based languages, which is all modern languages not in real time safety critical environments.

esseph · 2025-12-13T22:15:06 1765664106

This acts like one app is the only app running on device, which in the case of k8s, clearly isn't the case.

If you want to get scheduled on a node for execution after a node failure, your resource requests need to fit / pack somewhere.

The more accurately modeled application limits are, the better cluster packing gets.

This impacts cost.

gopher_space · 2025-12-13T20:15:55 1765656955

> Perhaps it's no longer valid, but who wants to bring down the service

I'm thinking more like getting a junior to do efficiency passes on a year's worth of data.

wozzio · 2025-12-13T20:51:48 1765659108

Exactly we call it sleep insurance. It is rational for the on call engineer to pad the numbers but it's just irrational for the finance team to pay for it.

karolinepauls · 2025-12-13T19:42:42 1765654962

> Python: ~60% waste (Mostly sized for startup spikes, then idles empty).

I understand we're talking about CPU in case of Python and memory for Java and Go. While anxious overprovisioning of memory is understandable, doing the same for CPU probably means lack of understanding of the difference between CPU limits and CPU requests.

Since I've been out of DevOps for a few years, is there ever a reason not to give each container the ability to spike up to 100% of 1 core? Scheduling of mass container startup should be a solved problem by now.

cogman10 · 2025-12-13T19:48:58 1765655338

I don't think there is. You should set both and limit doesn't need to match request for CPU.

Your limit should roughly be "what should this application use if it goes full bore" and your request should be "what does this use at steady state".

At least at my company, the cluster barely uses any CPU even during the busiest hours. We are fairly over provisioned there because a lot of devs are keeping limit and request the same.

wozzio · 2025-12-13T20:50:22 1765659022

CPU bursting is safe you just get throttled. Memory bursting is dangerous you get OOMKilled.

That's why Python numbers look so bad here devs set the request high enough to cover that initial model loading spike so they don't crash during a rollout, even if they idle at 10% usage afterwards.

Narishma · 2025-12-13T19:50:23 1765655423

What makes you think they're talking about CPU? It reads to me like it's memory.

karolinepauls · 2025-12-14T19:08:13 1765739293

Two things - the word "idles" and the nature of CPython's allocator which generally doesn't return memory to the OS but reuses it internally. So you cannot really "spike" memory usage, only grow it.

Marazan · 2025-12-13T19:47:26 1765655246

Interesting, at the company I work for Java pods are (historically) over provisioned with CPU but quite tightly provisioned with memory

wozzio · 2025-12-13T20:52:22 1765659142

That is the opposite of what I usually see. Are you trading CPU for RAM by running a more aggressive GC like ZGC or Shenandoah? Usually, people starve the CPU to buy more RAM.

pestatije · 2025-12-13T16:51:57 1765644717

is this an instantaneous measure or goes over the whole duration of the process?

stefan_ · 2025-12-13T19:59:01 1765655941

4GiB oh no we gave it Raspberry Pi memory?

This is truly one of the dumbest outcomes of the whole "resume driven cloud deployment" era. Clouds "market segmenting" with laughable memory, resume engineers want their line but not the recurring cost, engineers waste weeks investigating and working around out of memory issues that are purely some guy provisioning services with 2003 levels of memory. It's all a giant waste of time.

rvz · 2025-12-13T16:10:26 1765642226

> I've been consulting on EKS/GKE cost optimization for a few mid-sized companies and kept seeing the same pattern: massive over-provisioning of memory just to be safe.

Correct.

Developers keep over-provisioning as they need enough memory for the app to continue running as demand scales up. Since these languages have their own runtimes and GCs to manage memory, it already pre-allocates lots of RAM before running the app; adding to the bloat.

Part of the problem is not only a technical one (the language may be bloated and inefficient) but it is completely psychological as developers are scared of their app getting an out-of-memory exception in production.

As you can see, the languages with the most waste are the ones that are inefficient both runtime (speed) and space (memory) complexity and take up the most memory and are slower (Python, and Java) and costs a lot of money to continue maintaining them.

I got downvoted over questioning the microservice cargo cult [0] with Java being the darling of that cult. If you imagine a K8s cluster with any of these runtimes, you can see which one will cost the most as you scale up with demand + provisioning.

Languages like Go, and Rust are the clear winners if you want to save lots of money and are looking for efficiency.

[0] https://news.ycombinator.com/item?id=44950060