More

t0b1 · 2025-09-03T14:35:14 1756910114

This is in relation to their TPCH benchmark which can be due to a variety of reasons. My guess would be that they can generate stencils for whole operators which can be transformed into more efficient code at stencil generation time while LLVM-O0 gets the operator in LLVM-IR form and can do no such transformation. Though I can't verify this because their benchmark setup seems a bit more involved.

When used in a C/C++ compiler the stencils correspond to individual (or a few) LLVM-IR instructions which then leads to bad runtime performance. Also as mentioned, on larger functions register allocation becomes a problem for the Copy-and-Patch approach.

t0b1 · 2025-06-02T10:42:33 1748860953

TPDE is a framework for writing a back-end for various SSA IRs. TPDE-LLVM is an LLVM back-end written using TPDE, but TPDE itself is independent of LLVM. The paper also mentions back-ends written for Cranelift's IR and Umbra IR using TPDE.

t0b1 · 2025-02-19T03:39:16 1739936356

> All the money in the world won't change the fact that Source is an outdated pile of garbage.

Given that the engine has been in, essentially, maintenance mode for almost a decade now, that is not really surprising. A more apt comparison would be Source 2 I assume.

> some events can take half a second to make it from a player's action to another player's system

What are events here? At least in normal source this should be impossible for anything movement/input related as the server processes the input each tick and then distributes that to each client (the Apex implementation should still do this). If it takes half a second to forward such an action, the whole server should hang for this time in the eyes of each client.

> The game's input processing is done in relation to framerate

This is a behavior added by Respawn, not something normal Source does.

> The security and anti-cheat in the game is so hamstrung by Source

That is a really broad claim imo. AFAIK CSGO only had this issue once in its lifetime and that was caused not by an issue in the engine but in the matchmaking service. So isn’t it more likely that Respawn just screwed something up?

t0b1 · 2025-02-12T19:26:31 1739388391

The author striked out the part about CedarDB not being available -- which is true -- but Umbra is available as a docker container[1] for some time now. The "Umbra paper" linked also contains an artifact[2] with a copy of Umbra as well as some instructions on how to control the back-ends used etc. (Cranelift is not available as an option in the docker versions however)

I kind of disagree with the assumption that baseline compilers are easy to build (depending on your definition of baseline). A back-end like DirectEmit is not easy to write and even harder to verify. If you have a back-end written for your own IR you will likely have to write tests in that IR and it will probably be quite hard to simply port over codegen (or run-) tests from other compilers. Especially in the context of databases it is not very reassuring if you have a back-end that may explode the second you start to generate code slightly differently. We're working on making this a bit more commoditized but in our opinion you will always need to do some work since having another IR (with defined semantics someone could write a code generator for you) for a back-end is very expensive. In Umbra, translating Umbra-IR to LLVM-IR takes more time than compiling to machine code with DirectEmit.

Also, if it is easy to write them, I would expect to see more people write them.

Copy-and-patch was also tried in the context of MLIR[3] but the exec-time results were not that convincing and I have been told that it is unlikely for register allocation to work sufficiently well to make a difference.

[1]: https://hub.docker.com/r/umbradb/umbra

[2]: https://zenodo.org/records/10357363

[3]: https://home.cit.tum.de/~engelke/pubs/2403-cc.pdf

UncleEntity · 2025-02-12T22:46:05 1739400365

> We're working on making this a bit more commoditized but in our opinion you will always need to do some work since having another IR (with defined semantics someone could write a code generator for you) for a back-end is very expensive.

I've been nipping at the edges of adapting the vmIDL from vmgen and using that to generate the machinery to do some jitting. But I'm slow and lazy...

The general idea is to have the hypothetical user define the operands of their IR along with the code snippets and use this to stitch together a jit compiler library. Or perhaps a wrapper around an existing jit library, dunno. Either way it gives me some yaks to shave which makes me happy.

t0b1 · on Oct 4, 2024

One thing I noticed though is that when autocompleting C++ statements like if or while it will add only the opening curly braces which is a bit annoying but makes sense. But it also sometimes adds them @_@

t0b1 · on July 17, 2024

though I believe in RISC-V‘s case what will happen is that every vendor will have that realization at the same time, not tell anyone and make an extension and now there‘s five different incompatible encodings for the same operation.

snvzz · on July 18, 2024

And that doesn't matter, because:

- Such custom extensions live in custom extension space.

- Software ecosystem that must work across vendors will use neither.

- If these extensions actually do something useful, the experience from them will be leveraged to make a standard extension, which will at some point make it into a standard profile, and thus adopted by the software ecosystem.

Pet_Ant · on July 17, 2024

Opensource fundamentally changes that situation. All you need is a maintained version of GCC/LLVM that supports your processor and you’ll have distro that supports your needs. Especially if it’s just about some performance boosting instructions. It’s not going to be an issue, we really aren’t in a binary world anymore for the most part.

t0b1 · on April 19, 2024

The bin packing will probably make it slower though, especially in the bool case since it will create dependency chains. For bools on x64, I don‘t think there‘s a better way than first having to get them in a register, shift them and then OR them into the result. The simple way creates a dependency chain of length 64 (which should also incur a 64 cycle penalty) but you might be able to do 6 (more like 12 realistically) cycles. But then again, where do these 64 bools come from? There aren‘t that many registers so you will have to reload them from the stack. Maybe the rust ABI already packs bools in structs this tightly so it‘s work that has to be done anyway but I don‘t know too much about it.

And then the caller will have to unpack everything again. It might be easier to just teach the compiler to spill values into the result space on the stack (in cases the IR doesn‘t already store the result after the computation) which will likely also perform better.

dzaima · on April 19, 2024

Unpacking bools is cheap - to move any bit into a flag is just a single 'test' instruction, which is as good as it gets if you have multiple bools (other than passing each in a separate flag, which is quite undesirable).

Doing the packing in a tree fashion to reduce latency is trivial, and store→load latency isn't free either depending on the microarchitecture (and at the counts where log2(n) latency becomes significant you'll be at IPC limit anyway). Packing vs store should end up at roughly the same instruction counts too - a store vs an 'or', and exact same amount of moving between flags ang GPRs.

Reaching 64 bools might be a bit crazy, but 4-8 seems reasonably attainable from each of many arguments being an Option<T>, where the packing would reduce needed register/stack slot count by ~2.

Where possible it would of course make sense to pass values in separate registers instead of in one, but when the alternative is spilling to stack, packing is still worthy of consideration.

saghm · on April 19, 2024

> Reaching 64 bools might be a bit crazy, but 4-8 seems reasonably attainable from each of many arguments being an Option<T>, where the packing would reduce needed register/stack slot count by ~2

I don't have a strong sense of how much more common owned `Option` types are than references, but it's worth noting that if `T` is a reference, `Option<T>` will just use a pointer and treat the null value as `None` under the hood to avoid needing any tag. There are probably other types where this is done as well (maybe `NonZero` integer types?)

tialaramex · on April 20, 2024

Rust has a thing called the Guaranteed Niche Optimisation, which says if you make a Sum type, and the Sum type has exactly one variant which is just itself, plus exactly one variant which has a niche (a bit pattern which isn't used by any valid representation of that type) then it promises that your type is the same size as the type with the niche in it.

That is, if you made your own Maybe type which works like Option, it's also guaranteed to get this optimisation, and the optimisation works for any type which the compiler knows has a "niche", not just obvious things like references or small enumerations, the NonZero type, but also e.g. OwnedFd, a type which is a Unix file descriptor - Unix file descriptors cannot be -1, and so logically the bit pattern for -1 serves as a niche for this type.

I really like this feature, and I want to use it more. There's good news and bad news. The good news is that although the Guaranteed Niche Optimisation is the only such guarantee, in practice the Rust compiler will do much more with a niche.

The bad news is that we're not allowed to make new types with their own niches (other than enumerations which automatically get an appropriately sized niche) in stable Rust today. In fact the ability to mark a niche is not only permanently unstable (thus usable in practice only from the Rust stdlib) but it's a compiler internal feature, they're pretty much telling you not to touch this, it can't and won't get stabilized in this form)

But, we do have a good number of useful niches in the standard library, all references, the NonNull pointers (if you use pointers for something), the NonZero types, the booleans, small C-style enumerations, OwnedFd, that's quite a lot of possibilities.

The main thing I want, and the reason I tried to make more movement happen (but I have done very little for about a year) is BalancedIx a suite of types like NonZero, but missing the most negative values of the signed integers. You very rarely need -128 on an 8-bit signed integer, and it's kind of a nuisance, so BalancedI8 would be the same size, it loses -128 and in exchange Option<BalancedI8> is the same size and now abs does what you expected, two for the price!

ratmice · on April 19, 2024

Yeah, `NonZero*` but also a type like `#[repr(u8)] enum Foo{ X }`, according to `assert_eq!(std::mem::size_of::<Option<Foo>(), std::mem::size_of::<Foo>())` you need an enum which fully saturates the repr, e.g. `#[repr(u8)]Bar { X0, ... X255}` (pseudo code) before niche optimization fails to kick in.

saghm · on April 19, 2024

Oh, good to know!

t0b1 · on April 8, 2024

Maybe people don‘t because they see, for example, Valve (a billion dollar company) struggling to get GNOME to implement drm-leasing for VR headsets. IIRC they‘ve been at it for multiple years, too.

Or maybe it‘s because the compositor developers are not exactly concerned about ease of development. To quote a GNOME dev[0] about support for the aforementioned drm-leasing protocol:

> I honestly don't have a problem with forcing clients to implement the portal if they want to work on mutter.

I wouldn‘t blame the people who choose to simply not engage with that process, especially those who work on these things in their free time.

[0]: https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/2759

t0b1 · on Nov 28, 2023

> exotic things like take a screenshot

I'm not sure but is this meant ironically? Because taking a screenshot is a thing people do very often. Capturing your screen is, too. Even by third-party programs. And yet they never put it in the protocol but just said to go to dbus.

> GNOME, KDE, and wlroots have each implemented a different protocol

Is that true? My understanding was that GNOME never put forth a protocol but always did their dbus thing. I know that wlroots proposed a bunch of protocol extensions but those never went anywhere since, well, GNOME didn't want to implement them[1]. Nowadays I think every compositor(?) just implements the xdg-portal stuff.

[1]: https://gitlab.freedesktop.org/wayland/wayland/-/issues/32#n...

eadmund · on Nov 28, 2023

I think to the Wayland folks, anything other than using Windows is an exotic use case. They’re not entirely wrong, but if we wanted to use Windows, we … would be using Windows. The great thing about Linux was that we could do more than Windows or macOS permitted.

jdiff · on Nov 28, 2023

Weird take. Windows and macOS both let you do things Wayland doesn't currently have widely adopted protocols for. Only thing Xorg can do that any of the others can't is let a client application manage windows, and it's only macOS that really doesn't like that.

yjftsjthsd-h · on Nov 29, 2023

> I'm not sure but is this meant ironically? Because taking a screenshot is a thing people do very often.

Yes, I was intending to mock Wayland for failing, after 15 years of development, to have a single way of taking screenshots that works in all environments. I understand wanting to make things modular and flexible and make as much optional as possible[0], and I understand wanting to ship the core protocol first and then follow up with drafting other things and building consensus, but the fact that this thing was allowed to hit end-users with screenshots this broken - and that it still hasn't been fixed, and AIUI never will be fixed because GNOME seems to think that the situation doesn't need fixing - is a fairly damning incitement of Wayland as a whole.

> Is that true? My understanding was that GNOME never put forth a protocol but always did their dbus thing. I know that wlroots proposed a bunch of protocol extensions but those never went anywhere since, well, GNOME didn't want to implement them[1]. Nowadays I think every compositor(?) just implements the xdg-portal stuff.

I think it's true, though it's honestly hard to tell. Most of the screenshot tools I could find are using the wlroots protocol, KDE's official tool notes,

> Spectacle is a screenshot taking utility for the KDE desktop. Spectacle can also be used in non-KDE X11 desktop environments.

which doesn't really explicitly say much, and in fact the only tool I could find that claimed to be able to support everything was ksnip, which seems to work fine with wlroots but beyond that https://github.com/ksnip/ksnip#known-issues outlines the situation well enough; KDE is at least only temporarily broken, but GNOME isn't going to improve because GNOME did that on purpose. Now, that readme says you can use xdg-desktop-portal, but I have a GNOME+Wayland machine on hand, and I couldn't get it to actually work. I think what's supposed to happen is that every time I do a screenshot it prompts for permission, which I wanted to verify so I could complain that that was totally unreasonable, but what actually happens is that it just fails, which is... not better. Oh, and while searching for solutions to that I found flameshot, but that just refuses to even run. So... maybe someday the portal solution will work; in the meantime, I feel comfortable describing the situation as Wayland not having a uniform working way of taking screenshots.

[0] In particular, so we can avoid the situation from X11 where a load of drawing primitives are baked in that nobody has any use for anymore.

t0b1 · on Aug 9, 2023

That‘s a very nice writeup, I like the style. The mix of text to illustrations/memes was really pleasent. I have my reservations about the RISC/CISC nomenclature but I guess that‘s „each to their own“ >.>

As someone who has spent some time figuring out how parts of the kernels work I can sympathize with the pain it probably was (but well worth it given the article imo).

For NT, I think that Windows Internals covers a lot about the stuff one wants to know and Microsoft‘s documentation is also not bad (certainly better than Linux‘s kernel docs imo); it‘s a really good starting point.

For more info about Windows I can recommend gamehacking forums/resources. There‘s a lot of filtering needed but they are a pretty good source of info for niche things sometimes.

As a last note, I noticed that the font of some code blocks are pretty large when viewed on my smartphone making them hard to read (e.g. Ch. 6/main.c)

P.S.: > If you are a teenager and you like computers and you are not already in the Hack Club Slack, you should join right now

Way too remind me that I‘m getting old lol