I wonder how much of (2) is speculative and how much of it is a real need in act...

jandrewrogers · on Sept 25, 2022

The negative performance impact of GC in performance-engineered code is neither small nor controversial, it is mechanical consequence of the architecture choices available. Explicit locality and schedule control makes a big difference on modern silicon. Especially for software that is expressly engineered for maximum performance, the GC equivalent won't be particularly close to a non-GC implementation. Some important optimization techniques in GC languages are about effectively disabling the GC.

While some applications written in C++ are not performance sensitive, performance tends to be a major objective when choosing C++ for many applications.

nsajko · on Sept 25, 2022

When people complain about "negative performance impact of GC", often they're actually bothered by badly designed languages like Java that force heap-allocation of almost everything.

I think this might have been fixed in latest versions of Java, though, not sure if value types are already in the language or just coming soon.

Aside from that, it's my understanding that GC can be both a blessing and a curse for performance (throughput), that is, an advanced-enough GC implementation should (theoretically?) be faster than manual memory management.

jandrewrogers · on Sept 26, 2022

In theory, a GC should never be faster than manual memory management. Anything a GC can do can be done manually, but manual management has much more context about appropriate timing, locality, and resource utilization that a GC can never have. A large aspect of performance in modern systems is how effectively you can pipeline and schedule events through the CPU cache hierarchy.

There are a few different ways a GC impacts code performance. First, even low-latency GCs have a latency similar to a blocking disk op or worse on modern hardware. In high-performance systems we avoid blocking disk ops entirely specifically because it causes a significant loss in throughput, instead using io_submit/io_uring. Worse, we have limited control over when a GC occurs; at least with blocking disk ops we can often defer them until a convenient time. To fit within these processing models, worst case GC latency would need to be much closer to microseconds.

Second, a GC operation tends to thrash the CPU cache, the contents of which were carefully orchestrated by the process to maximize throughput before being interrupted. This is part of the reason high-performance software avoids context-switching at all costs (see also: thread-per-core software architecture). It is also an important and under-appreciated aspect of disk cache replacement algorithms, for example; an algorithm that avoids thrashing the CPU cache can have a higher overall performance than an algorithm that has a higher cache hit rate.

Lastly, when there is a large stall (e.g. a millisecond) in the processing pipeline outside the control of the process, the effects of that propagate through the rest of the system. It become very difficult to guarantee robust behaviors, safety, or resource bounds when code can stop running at arbitrary points in time. While the GC is happening, finite queues are filling up. Protecting against this requires conservative architectures that leave a lot of performance on the table. If all non-deterministic behavior is asynchronous, we can optimize away many things that can never happen.

A lot of modern performance comes down to exquisite orchestration, scheduling, and timing in complex processes. A GC is like a giant, slow chaos monkey that randomly destroys the choreography that was so carefully created to produce that high-performance.

phao · on Sept 25, 2022

> performance tends to be a major objective

My comment is about the thinking behind making this decision, C++ or not. It wasn't "is it speculative that GC will add a cost?" or something like that.

I wonder how much of the thinking that leads one to conclude "I need so much performance here that I can't afford a managed language", for example, is real carefully thought vs. speculative.

f1shy · on Sept 25, 2022

In my experience 99% speculative and WRONG. Who said: "early optimization is the root of all evil"? :) Today more and more is possible to have a GC without terrible performance issues. Some weeks ago I read an article here in HN, about LISP used for safety critical systems. That bad fame of GC comes from the early versions of Java... but I've been using GC languages a LOT, and never had those "stop the world" moments.

ncmncm · on Sept 25, 2022

The expression is "premature optimization". And, Donald Knuth.

GC overhead is always hard to measure except end-to-end, because it is distributed over everything else that happens. Cache misses, TLB shoot-downs. Mitigations are very difficult to place.

Practically, you usually just have to settle for lower performance, and most people do. Double or triple your core count and memory allocation, and bull ahead.

Gibbon1 · on Sept 26, 2022

Not my bailiwick but I feel like early Java's problem was a combination of everything not a simple primitive is an object that goes on the heap plus a GC optimized for batch throughput vs latency. Bonus it's Java all the way down to the bitter end.

I'm with you. One should look at latency requirements and the ratio of profit vs server costs when making a decision. AKA when your product generates $250k/month, you're paying three programmers $40k/month, and your AWS bill is $500/month isn't the time to try and shave pennies.