More

andyayers · 2025-10-09T02:02:51 1759975371

Microsoft | Compiler Engineer, .NET JIT | C++, C# | Redmond WA / Other locations possible

We are a small tight-knit team, happy to both teach and learn new ways of making code run faster.

If you're curious about the kind of work we have been doing recently, check out the JIT section of https://devblogs.microsoft.com/dotnet/performance-improvemen... and or https://github.com/dotnet/runtime/blob/main/docs/design/core...

vpianykh · 2025-10-16T22:04:15 1760652255

hey andyayers, how the one can apply?

andyayers · 2025-10-07T01:21:16 1759800076

.NET JIT developer here.

What else would we call it? It is what it is.

I believe there are some differences between what .NET does and what mainstream Java does. For instance, objects can be stack allocated even if they can't be turned into collections of scalars. This allows the JIT to stack allocate small known-sized arrays.

andyayers · 2025-10-07T01:09:30 1759799370

Sometimes, yes.

Linq contains a goodly number of hand-crafted special-case enumerators for common collections, or collections with certain interfaces, or span projections that are really nice optimizations but can complicate things for the JIT.

Some details here if you're curious: https://github.com/dotnet/runtime/blob/main/docs/design/core...

andyayers · 2025-09-10T16:41:31 1757522491

Here is one that has some historical comparison, though it does not show perf on Framework, and no .NET 10 yet.

https://endjin.com/blog/2024/11/how-dotnet-9-boosted-ais-dot...

andyayers · 2025-05-21T17:11:59 1747847519

HPUX compilers were doing this back in 1993.

abainbridge · 2025-05-21T17:47:13 1747849633

Or academics in 1986: https://dl.acm.org/doi/abs/10.1145/13310.13338

The idea of optimizations running at different stages in the build, with different visibility of the whole program, was discussed in 1979, but the world was so different back then that the discussion seems foreign. https://dl.acm.org/doi/pdf/10.1145/872732.806974

jeffbee · 2025-05-21T17:34:30 1747848870

Oh yeah, well ... actually I got nothin'. You win.

I will just throw in some nostalgia for how good that compiler was. My college roommate brought an HP pizza box that his dad secured from HP, and the way the C compiler quoted chapter and verse from ISO C in its error messages was impressive.

andyayers · 2025-05-21T01:19:31 1747790371

Unfortunately, those improvements don't work for Linq.

Some notes on why this is so here: https://github.com/dotnet/runtime/blob/main/docs/design/core...

zamalek · 2025-05-21T04:35:54 1747802154

Aw, I had no idea that it didn't work out. If they work that out I'd put good money on a colossal perf boost across the board.

andyayers · 2025-03-31T16:33:36 1743438816

Long-running methods (like the one here) transition mid-execution to more optimized versions, via on-stack replacement (OSR), after roughly 50K iterations. So you end up running optimized code either if the method is called a lot or loops frequently.

The OSR transition happens here, but between .net8 and .net9 some aspects of loop optimizations in OSR code regressed.

Dylan16807 · 2025-03-31T18:31:40 1743445900

So there actually was a regression and it wasn't an intentional warmup delay?

andyayers · 2025-03-31T18:44:14 1743446654

There indeed is a regression if the method is only called a few times. But not if it is called frequently.

With BenchmarkDotNet it may not be obvious which scenario you intend to measure and which one you end up measuring. BDN runs the benchmark method enough times to exceed some overall "goal" time for measuring (250 ms I think). This may require many calls or may just require one.

andyayers · 2025-03-03T19:31:20 1741030280

For the .NET JIT, at least, speculation on types seems beneficial even if we're only right maybe 30% of the time.

See eg https://github.com/dotnet/runtime/blob/main/docs/design/core...

(where this is presented as a puzzle)....

pizlonator · 2025-03-03T19:40:15 1741030815

Guarded devirtualization is different from the speculation that I'm talking about.

To me, speculation is where the fail path exits the optimized code.

To handle JS's dynamism, guarding is usually not worth it (though JSC has the ability to do that, if the profiling says that the fail path is probable). I believe that most of HotSpot's perf comes from speculation rather than guarded devirt.

titzer · 2025-03-04T15:36:35 1741102595

> To me, speculation is where the fail path exits the optimized code.

V8 is now doing profile-based guarded inlining for Wasm indirect calls. The guards don't deopt, so it's a form of biasing where the fail path does indeed go through the full indirect call. That means the fail path rejoins, and ultimately, downstream, you don't learn anything, e.g. that there were no aliasing side effects, or anything about the return type of the inlined code.

You can get some of the effect of speculation with tail duplication after biasing, but in order to get the full effect you'd have to tail-duplicate all the way to the end of a function, or even unroll another iteration of the loop. It's possible to do this if you're willing to spend a lot of code space by duplicating a lot of basic blocks.

But the expensive thing about speculation is the deopt path, which is a really expensive OSR transfer and usually throws away optimized code, too. So clearly biasing is a different tradeoff, and I wouldn't be surprised if biasing plus a little bit of tail duplication gets most of the benefit of deoptimization.

sitkack · 2025-03-04T15:40:23 1741102823

Would you mind deep linking to the V8 code that does this?

andyayers · on Oct 7, 2024

Or https://learn.microsoft.com/en-us/defender-endpoint/microsof...

(DevDrive + Defender's "performance mode")

andyayers · on Sept 7, 2024

In .NET, even in optimized methods, there can be "untracked" lifetimes where a stack slot is reported live to GC throughout the extent of a method, so presumably these can lead to the "over-reporting" cases mentioned.

The number of trackable lifetimes was 64 in .NET Framework but has been steadily increased in modern .NET and is now 1024, so it's rarely a capacity issue; but there are cases where we can't effectively reason about lifetimes.

For us another big drawback to conservative scanning is that any object referred to by a conservative reference cannot be relocated, since the reference might be live and is not guaranteed to be a GC reference; these objects are (in our parlance) effectively pinned, and this causes additional overhead.

neonsunset · on Sept 8, 2024

Thanks!

I knew about 1000 (turns out 1024) limit for method locals, in hindsight it does make sense for it to apply to gcref tracking just as much...