Hacker Newsnew | past | comments | ask | show | jobs | submit | andyayers's commentslogin

Microsoft | Compiler Engineer, .NET JIT | C++, C# | Redmond WA / Other locations possible

We are a small tight-knit team, happy to both teach and learn new ways of making code run faster.

If you're curious about the kind of work we have been doing recently, check out the JIT section of https://devblogs.microsoft.com/dotnet/performance-improvemen... and or https://github.com/dotnet/runtime/blob/main/docs/design/core...


hey andyayers, how the one can apply?


.NET JIT developer here.

What else would we call it? It is what it is.

I believe there are some differences between what .NET does and what mainstream Java does. For instance, objects can be stack allocated even if they can't be turned into collections of scalars. This allows the JIT to stack allocate small known-sized arrays.


Sometimes, yes.

Linq contains a goodly number of hand-crafted special-case enumerators for common collections, or collections with certain interfaces, or span projections that are really nice optimizations but can complicate things for the JIT.

Some details here if you're curious: https://github.com/dotnet/runtime/blob/main/docs/design/core...


Here is one that has some historical comparison, though it does not show perf on Framework, and no .NET 10 yet.

https://endjin.com/blog/2024/11/how-dotnet-9-boosted-ais-dot...


HPUX compilers were doing this back in 1993.


Or academics in 1986: https://dl.acm.org/doi/abs/10.1145/13310.13338

The idea of optimizations running at different stages in the build, with different visibility of the whole program, was discussed in 1979, but the world was so different back then that the discussion seems foreign. https://dl.acm.org/doi/pdf/10.1145/872732.806974


Oh yeah, well ... actually I got nothin'. You win.

I will just throw in some nostalgia for how good that compiler was. My college roommate brought an HP pizza box that his dad secured from HP, and the way the C compiler quoted chapter and verse from ISO C in its error messages was impressive.


Unfortunately, those improvements don't work for Linq.

Some notes on why this is so here: https://github.com/dotnet/runtime/blob/main/docs/design/core...


Aw, I had no idea that it didn't work out. If they work that out I'd put good money on a colossal perf boost across the board.


Long-running methods (like the one here) transition mid-execution to more optimized versions, via on-stack replacement (OSR), after roughly 50K iterations. So you end up running optimized code either if the method is called a lot or loops frequently.

The OSR transition happens here, but between .net8 and .net9 some aspects of loop optimizations in OSR code regressed.


So there actually was a regression and it wasn't an intentional warmup delay?


There indeed is a regression if the method is only called a few times. But not if it is called frequently.

With BenchmarkDotNet it may not be obvious which scenario you intend to measure and which one you end up measuring. BDN runs the benchmark method enough times to exceed some overall "goal" time for measuring (250 ms I think). This may require many calls or may just require one.


For the .NET JIT, at least, speculation on types seems beneficial even if we're only right maybe 30% of the time.

See eg https://github.com/dotnet/runtime/blob/main/docs/design/core...

(where this is presented as a puzzle)....


Guarded devirtualization is different from the speculation that I'm talking about.

To me, speculation is where the fail path exits the optimized code.

To handle JS's dynamism, guarding is usually not worth it (though JSC has the ability to do that, if the profiling says that the fail path is probable). I believe that most of HotSpot's perf comes from speculation rather than guarded devirt.


> To me, speculation is where the fail path exits the optimized code.

V8 is now doing profile-based guarded inlining for Wasm indirect calls. The guards don't deopt, so it's a form of biasing where the fail path does indeed go through the full indirect call. That means the fail path rejoins, and ultimately, downstream, you don't learn anything, e.g. that there were no aliasing side effects, or anything about the return type of the inlined code.

You can get some of the effect of speculation with tail duplication after biasing, but in order to get the full effect you'd have to tail-duplicate all the way to the end of a function, or even unroll another iteration of the loop. It's possible to do this if you're willing to spend a lot of code space by duplicating a lot of basic blocks.

But the expensive thing about speculation is the deopt path, which is a really expensive OSR transfer and usually throws away optimized code, too. So clearly biasing is a different tradeoff, and I wouldn't be surprised if biasing plus a little bit of tail duplication gets most of the benefit of deoptimization.


Would you mind deep linking to the V8 code that does this?


Or https://learn.microsoft.com/en-us/defender-endpoint/microsof...

(DevDrive + Defender's "performance mode")


In .NET, even in optimized methods, there can be "untracked" lifetimes where a stack slot is reported live to GC throughout the extent of a method, so presumably these can lead to the "over-reporting" cases mentioned.

The number of trackable lifetimes was 64 in .NET Framework but has been steadily increased in modern .NET and is now 1024, so it's rarely a capacity issue; but there are cases where we can't effectively reason about lifetimes.

For us another big drawback to conservative scanning is that any object referred to by a conservative reference cannot be relocated, since the reference might be live and is not guaranteed to be a GC reference; these objects are (in our parlance) effectively pinned, and this causes additional overhead.


Thanks!

I knew about 1000 (turns out 1024) limit for method locals, in hindsight it does make sense for it to apply to gcref tracking just as much...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: