More

pansa2 · 2025-12-11T13:17:38 1765459058

There was a previous PEP (in 2012) with the exact same title:

https://peps.python.org/pep-0416/

Also one in 2019 for a "frozenmap":

https://peps.python.org/pep-0603/

pansa2 · 2025-12-11T13:00:49 1765458049

> The values in tuples cannot change. The values that keys point to in a frozen dict can?

The entries of a tuple cannot be assigned to, but the values can be mutated. The same is true for a `frozendict` (according to the PEP they don't support `__setitem__`, but "values can be mutable").

vscode-rest · 2025-12-11T13:42:21 1765460541

Tuple entries must be hashable, which (as far as standard library is concerned) means immutable.

pansa2 · 2025-12-11T13:44:58 1765460698

  >>> hash([1, 2])
  TypeError: unhashable type: 'list'

  >>> t = ([1, 2], [3, 4])
  >>> print(t)
  ([1, 2], [3, 4])

vscode-rest · 2025-12-11T13:56:02 1765461362

Ah. Of course… that’s how the workaround to use tuples as frozen dicts can work in the first place. Slow morning for me!

pansa2 · 2025-12-11T12:45:00 1765457100

I wonder whether Raymond Hettinger has an opinion on this PEP. A long time ago, he wrote: "freezing dicts is a can of worms and not especially useful".

https://mail.python.org/pipermail/python-dev/2006-February/0...

pkulak · 2025-12-11T18:31:43 1765477903

This is why I love how Rust approached this; almost by accident to make borrow checking work. Every reference is either mutable or not, and (with safe code), you can't use an immutable reference to get a mutable reference anywhere down the chain. So you can slowly construct a map through a mutable reference, but then return it out of a function as immutable, and that's the end of it. It's no longer ever mutable, and no key or value is either. There's no need to make a whole new object called FrozenHashMap, and then FrozenList, and FrozenSet, etc. You don't need a StringBuilder because String is mutable, unless you don't want it to be. It's all just part of the language.

Kotlin _kinda_ does this as well, but if you have a reference to an immutable map in Kotlin, you are still free to mutate the values (and even keys!) as much as you like.

rcxdude · 2025-12-11T20:35:15 1765485315

Only if you're returning a reference or wrapping it in something that will only ever return a reference. If you return an object by value ('owned'), then you can do what you like with it and 'mut' is just an light guardrail on that particular name for it.

vlovich123 · 2025-12-11T20:54:58 1765486498

You cannot return an immutable version. You can return it owned (in which case you can assign/reassign it to a mut variable at any point) or you can take a mut reference and return an immutable reference - but whoever is the owner can almost always access it mutably.

pkulak · 2025-12-11T21:05:56 1765487156

Arg, you’re right. Not sure what I was thinking there. I still think my point stands, because you get the benefits of immutability, but yeah, I didn’t explain it well.

tekne · 2025-12-11T21:45:35 1765489535

I mean, if you return an immutable reference, the owner in fact cannot mutate it until that reference is dropped.

If you in fact return e.g. an Rc::new(thing) or Arc::new(thing), that's forever (though of course you can unwrap the last reference!)

aw1621107 · 2025-12-12T11:10:46 1765537846

> I mean, if you return an immutable reference, the owner in fact cannot mutate it until that reference is dropped.

Might be worth noting that "dropped" in this context doesn't necessarily correspond to the reference going out of scope:

    fn get_first(v: &Vec<i32>) -> &i32 {
        &v[0]
    }

    fn main() {
        let mut v = vec![0, 1, 2];
        let first = get_first(&v);
        print!("{}", first});
        v.push(3); // Works!
        // print!("{}", first); // Doesn't work
    }

the__alchemist · 2025-12-11T19:21:03 1765480863

My favorite part of rust is its explicit control over mutability, in the manner you describe.

jonathaneunice · 2025-12-11T13:13:34 1765458814

That's a great link and recommended reading.

It explains a lot about the design of Python container classes, and the boundaries of polymorphism / duck typing with them, and mutation between them.

I don't always agree with the choices made in Python's container APIs...but I always want to understand them as well as possible.

Also worth noting that understanding changes over time. Remember when GvR and the rest of the core developers argued adamantly against ordered dictionaries? Haha! Good times! Thank goodness their first wave of understanding wasn't their last. Concurrency and parallelism in Python was a TINY issue in 2006, but at the forefront of Python evolution these days. And immutability has come a long way as a design theme, even for languages that fully embrace stateful change.

zahlman · 2025-12-11T13:25:39 1765459539

> Also worth noting that understanding changes over time. Remember when GvR and the rest of the core developers argued adamantly against ordered dictionaries? Haha! Good times!

The new implementation has saved space, but there are opportunities to save more space (specifically after deleting keys) that they've now denied themselves by offering the ordering guarantee.

jonathaneunice · 2025-12-11T13:52:53 1765461173

Ordering, like stability in sorting, is an incredibly useful property. If it costs a little, then so be it.

This is optimizing for the common case, where memory is generally plentiful and dicts grow more than they shrink. Python has so many memory inefficiencies that occasional tombstones in the dict internal structure is unlikely to be a major effect. If you're really concerned, do `d = dict(d)` after aggressive deletion.

zahlman · 2025-12-11T14:09:52 1765462192

> Ordering, like stability in sorting, is an incredibly useful property.

I can't say I've noticed any good reasons to rely on it. Didn't reach for `OrderedDict` often back in the day either. I've had more use for actual sorting than for preserving the insertion order.

mcherm · 2025-12-11T17:36:04 1765474564

Personally, I find lots of reasons to prefer an orders Dict to an unordered one. Even small effects like "the debugging output will appear in a consistent order making it easier to compare" can be motivation enough in many use cases.

xen0 · 2025-12-11T18:34:48 1765478088

It's sometimes nice to be deterministic.

I don't often care about a specific order, only that I get the same order every time.

no_wizard · 2025-12-11T18:56:04 1765479364

Thinking about this upfront for me, I am actually wondering why this is useful outside of equality comparisons.

Granted, I live and work in TypeScript, where I can't `===` two objects but I could see this deterministic behavior making it easier for a language to compare two objects, especially if equality comparison is dependent on a generated hash.

The other is guaranteed iteration order, if you are reliant on the index-contents relationship of an iterable, but we're talking about Dicts which are keyed, but extending this idea to List, I see this usefulness in some scenarios.

Beyond that, I'm not sure it matters, but I also realize I could simply not have enough imagination at the moment to think of other benefits

xen0 · 2025-12-11T19:08:52 1765480132

I work on a build system (Bazel), so perhaps I care more than most.

But maybe it does all just come down to equality comparisons. Just not always within your own code.

dontlaugh · 2025-12-11T20:28:15 1765484895

Being able to parse something into a dict and then serialise it back to the same thing is a bit easier. Not a huge advantage, though.

morshu9001 · 2025-12-11T18:00:07 1765476007

Same. Recently I saw interview feedback where someone complained that the candidate used OrderedDict instead of the built-in dict that is now ordered, but they'll let it slide... As if writing code that will silently do different things depending on minor Python version is a good idea.

tehnub · 2025-12-11T18:17:23 1765477043

Well it's been guaranteed since 3.7 which came out in 2018, and 3.6 reached end-of-life in 2021, so it's been a while. I could see the advantage if you're writing code for the public (libraries, applications), but for example I know at my job my code is never going to be run with Python 3.6 or older.

morshu9001 · 2025-12-11T20:26:42 1765484802

Yeah, if you have that guarantee then I wouldn't fault anyone for using dict, but also wouldn't complain about OrderedDict.

sam_bristow · 2025-12-12T08:50:04 1765529404

Honestly, if I was writing some code that depended on dicts being ordered I think I'd still use OrderedDict in modern Python. I gives the reader more information that I'm doing something slightly unusual.

morshu9001 · 2025-12-12T18:26:18 1765563978

Same. Usually if a language has an ordered map, it's in the name.

kzrdude · 2025-12-11T19:46:30 1765482390

It seems like opinions really differ on this item then. I love insertion sort ordering in mappings, and python with it was a big revelation. The main reason is that keys need some order, and insertion order -> iteration order is a lot better than pseudorandom order (hash based orders).

For me, it creates more reproducible programs and scripts, even simple ones.

vanviegen · 2025-12-11T15:24:49 1765466689

Indeed! I don't understand why it isn't more common for stdlibs to include key-ordered maps and sets. Way more useful than insertion ordering.

zahlman · 2025-12-11T15:37:37 1765467457

Presumably because it involves different performance characteristics.

BiteCode_dev · 2025-12-11T18:00:28 1765476028

Ordering is very useful for testing.

This morning for example, I tested an object serialized through a JSON API. My test data seems to never match the next run.

After a while, I realized one of the objects was using a set of objects, which in the API was turned into a JSON array, but the order of said array would change depending of the initial Python VM state.

3 days ago, I used itertools.group by to group a bunch of things. But itertools.group by only works on iterable that are sorted by the grouping key.

Now granted, none of those recent example are related to dicts, but dict is not a special case. And it's iterated over regularly.

seanhunter · 2025-12-11T16:08:07 1765469287

Ordering is specifically a property (useful or not) that a set doesn't have. You need a poset for it to be ordered.

I would expect to use a different data structure if I needed an ordered set.

LtWorf · 2025-12-11T15:57:43 1765468663

Does your code actually rely on that? I've never once needed it.

mvanbaak · 2025-12-11T13:06:36 1765458396

This was 19 (almost) 20 years ago. As stated in the lwn.net article, a lot of concurrency has been added to python, and it might now be time for something like a frozendict.

Things that were not useful in 2006 might be totally useful in 2026 ;P

Still, like you, I'm curious wether he has anything to say about it.

aewens · 2025-12-11T13:39:32 1765460372

I think Raymond Hettinger is called out specially here because he did a well known talk called [Modern Dictionaries](https://youtu.be/p33CVV29OG8) where around 32:00 to 35:00 in he makes the quip about how younger developers think they need new data structures to handle new problems, but eventually just end up recreating / rediscovering solutions from the 1960s.

“What has been is what will be, and what has been done is what will be done; there is nothing new under the sun.”

sesm · 2025-12-11T13:59:13 1765461553

Since that time HAMT was invented and successfully used in Scala and Clojure, so this talk didn't age well.

Someone · 2025-12-11T14:38:59 1765463939

Wikipedia (https://en.wikipedia.org/wiki/Hash_array_mapped_trie) links to the paper describing HAMT (https://infoscience.epfl.ch/server/api/core/bitstreams/f66a3...) and claims that is from 2000. That talk is from 2016.

zelphirkalt · 2025-12-11T15:58:49 1765468729

Do you know of any implementation, that is well annotated/commented, so that it is easy to understand?

ndr · 2025-12-11T15:04:48 1765465488

HAMT weren't immutable/persistent until Clojure though: https://en.wikipedia.org/wiki/Persistent_data_structure#Pers...

Still well before the talk.

dkarl · 2025-12-11T14:13:02 1765462382

It's interesting that he concludes that freezing dicts is "not especially useful" after addressing only a single motivation: the use of a dictionary as a key.

He doesn't address the reason that most of us in 2025 immediately think of, which is that it's easier to reason about code if you know that certain values can't change after they're created.

What a change in culture over the last 20 years!

morshu9001 · 2025-12-11T18:02:47 1765476167

You can't really tell though. Maybe the dict is frozen but the values inside aren't. C++ tried to handle this with constness, but that has its own caveats that make some people argue against using it.

krick · 2025-12-11T19:58:30 1765483110

Indeed. So I don't really understand what this proposal tries to achieve. It even explicitly says that dict → frozendict will be O(n) shallow-copy, and the contention is only about O(n) part. So… yeah, I'm sure they are useful for some cases, but as Raymond has said — it doesn't seem to be especially useful, and I don't understand what people ITT are getting excited about.

morshu9001 · 2025-12-11T21:44:37 1765489477

Maybe treating Python like a systems language, so applying the same reasoning for const in C++ and Rust to it

zahlman · 2025-12-11T13:24:39 1765459479

> Another PEP 351 world view is that tuples can serve as frozenlists; however, that view represents a Liskov violation (tuples don't support the same methods). This idea resurfaces and has be shot down again every few months.

... Well, yes; it doesn't support the methods for mutation. Thinking of ImmutableFoo as a subclass of Foo is never going to work. And, indeed, `set` and `frozenset` don't have an inheritance relationship.

I normally find Hettinger very insightful so this one is disappointing. But nobody's perfect, and we change over time (and so do the underlying conditions). I've felt like frozendict was missing for a long time, though. And really I think the language would have been better with a more formal concept of immutability (e.g. linking it more explicitly to hashability; having explicit recognition of "cache" attributes, ...), even if it didn't go the immutable-by-default route.

kccqzy · 2025-12-11T14:44:24 1765464264

Apple (or perhaps NeXT) has solved this problem already in Objective-C. Look at NSArray and NSMutableArray, or NSData and NSMutableData. It’s intuitive and Liskov-correct to make the mutable version a subclass of the immutable version. And it’s clearly wrong to have the subclass relationship the other way around.

Given how dynamic Python is, such a subclass relationship need not be evident at the C level. You can totally make one class whose implementation is independent of another class a subclass of the other, using PEP 3119. This gives implementations complete flexibility in how to implement the class while retaining the ontological subclass relationship.

pansa2 · 2025-12-11T13:37:27 1765460247

> ImmutableFoo as a subclass of Foo is never going to work. And, indeed, `set` and `frozenset` don't have an inheritance relationship.

Theoretically, could `set` be a subclass of `frozenset` (and `dict` of `frozendict`)? Do other languages take that approach?

> linking [immutability] more explicitly to hashability

AFAIK immutability and hashability are equivalent for the language's "core" types. Would it be possible to enforce that equivalence for user-defined types, given that mutability and the implementation of `__hash__` are entirely controlled by the programmer?

kccqzy · 2025-12-11T14:46:38 1765464398

Yes you could. Other languages do. See NSMutableSet and NSSet in Objective-C.

chriswarbo · 2025-12-11T13:52:11 1765461131

> Theoretically, could `set` be a subclass of `frozenset` (and `dict` of `frozendict`)?

At one extreme: sure, anything can be made a subclass of anything else, if we wanted to.

At the other extreme: no, since Liskov substitution is an impossibly-high bar to reach; especially in a language that's as dynamic/loose as Python. For example, consider an expression like '"pop" in dir(mySet)'

tremon · 2025-12-11T14:11:58 1765462318

> consider an expression like '"pop" in dir(mySet)'

  class frozenset:
    pass
  
  class set(frozenset):
    def pop(self, key):
      pass

I don't see why hasattr(mySet, 'pop') should be a problem here?

chriswarbo · 2025-12-11T16:04:42 1765469082

> I don't see why hasattr(mySet, 'pop') should be a problem here?

I never said it's a problem (and I never said it's not!). I was specifically addressing two things:

- The "theoretical" nature of the question I quoted (i.e. ignoring other aspects like subjectivity, practicality, convention, etc.)

- The reasoning about "Liskov violation", which was quoted further up this thread.

For context, here's Liskov's definition of their principle (from https://en.wikipedia.org/wiki/Liskov_substitution_principle ):

> Barbara Liskov and Jeannette Wing described the principle succinctly in a 1994 paper as follows:[1]

> > Subtype Requirement: Let ϕ(x) be a property provable about objects x of type T. Then ϕ(y) should be true for objects y of type S where S is a subtype of T.

My expression `"pop" in dir(mySet)` gives an explicit example of how `set` and `frozenset` are not subtypes of each other (regardless of how they're encoded in the language, with "subclasses" or whatever). In this case `ϕ(x)` would be a property like `'"pop" in dir(x)' = 'False'`, which holds for objects x of type frozenset. Yet it does not hold for objects y of type set.

Your example of `hasattr(mySet, 'pop')` gives another property that would be violated.

My point is that avoiding "Liskov violations" is ("theoretically") impossible, especially in Python (which allows programs to introspect/reflect on values, using facilities like 'dir', 'hasattr', etc.).

(FYI I became rather jaded on the Liskov substitution principle after reading https://okmij.org/ftp/Computation/Subtyping )

kccqzy · 2025-12-11T19:50:51 1765482651

> I became rather jaded on the Liskov substitution principle after reading https://okmij.org/ftp/Computation/Subtyping

The root of the issue here is that Liskov substitution principle simply references ϕ(x) to be some property satisfied by objects of a class. It does not distinguish between properties that are designed by the author of the class to be satisfied or properties that happen to be satisfied in this particular implementation. But the Hyrum’s Law also states that properties that are accidentally true can become relied upon and as time passes become an intrinsic property. This to me suggests that the crux of the problem is that people don’t communicate sufficiently about invariants and non-invariants of their code.

tremon · 2025-12-11T16:18:22 1765469902

> > Subtype Requirement: Let ϕ(x) be a property provable about objects x of type T. Then ϕ(y) should be true for objects y of type S where S is a subtype of T.

This says "if hasattr(parent, 'pop') == True then hasattr(child, 'pop') must be True". This is not violated in this case, since hasattr(parent, 'pop') is False. If you want to extend the above definition so that negative proofs concerning the parent should also hold true for the child, then subtyping becomes impossible since all parent and child types must be identical, by definition.

minitech · 2025-12-11T20:39:38 1765485578

The property in question is `hasattr(x, "pop") is False`.

> If you want to extend the above definition so that negative proofs concerning the parent should also hold true for the child, then subtyping becomes impossible since all parent and child types must be identical, by definition.

The distinction isn’t “negative proofs”, but yes, that’s their point. In Python, you have to draw a line as to which observable properties are eligible.

FreakLegion · 2025-12-12T05:38:18 1765517898

> I've felt like frozendict was missing for a long time, though.

Type the dict as a mapping when you want immutability:

  x: Mapping[int, int] = {1: 1}

  x[1] = 2  # Unsupported target for indexed assignment ("Mapping[int, int]").

The only problem I've seen with this is:

  y = {}
  y[x] = 0  # Mypy thinks this is fine. Mapping is hashable, after all!

The issue here is less that dict isn't hashable than that Mapping is, though.

zahlman · 2025-12-12T13:00:26 1765544426

This is because the ABC system is defined such that MutableMapping is a subtype of Mapping. Which mostly makes sense, except that if we suppose there exist Mappings that aren't MutableMappings (such that it makes sense to recognize two separate concepts in the first place), then Mapping should be hashable, because immutable things generally should be hashable. Conceptually, making something mutable adds a bunch of mutation methods, but it also ought to take away hashing. So Liskov frowns regardless.

tmp10423288442 · 2025-12-11T19:04:47 1765479887

And, to the point of this proposal, `dict` and `frozendict` don't have an inheritance relationship either.

immibis · 2025-12-11T20:10:24 1765483824

ImmutableFoo can't be a subclass of Foo, since it loses the mutator methods. But nor can Foo be a subclass of ImmutableFoo, since it loses the axiom of immutability (e.g. thread-safety) that ImmutableFoo has.

When you interpret Liskov substitution properly, it's very rare that anything Liskov-substitutes anything, making the entire property meaningless. So just do things based on what works best in the real world and aim for as much Liskov-substitution as is reasonable. Python is duck-typed anyway.

It's a decent guiding principle - Set and ImmutableSet are more substitutable than Set and Map, so Set deriving from ImmutableSet makes more sense than Set deriving from Map. It's just not something you can ever actually achieve.

morshu9001 · 2025-12-11T17:57:54 1765475874

I agree, same with frozenset. If you really want to use one of those as a key, convert to a tuple. There might be niche use cases for all this, but it's not something that the language or even the standard lib need to support.

boothby · 2025-12-11T18:18:45 1765477125

Problem being that sets aren't consistently ordered and conversion to a tuple can result in an exponential (specifically, factorial) explosion in the number of possible keys associated with a single set. Nor can you sort all objects. Safe conversion of sets to tuples for use as keys is possible but the only technique I know requires an auxiliary store of objects (mapping objects to the order in which they were first observed), which doesn't parallelize well.

morshu9001 · 2025-12-11T18:39:21 1765478361

tuple(sorted(s)) and if you can't even sort the values, they're probably not hashable. I get that this involves a copy, but so does frozenset, and you can cross that bridge in various ways if it's ever a problem.

boothby · 2025-12-11T19:46:35 1765482395

Here are some types that support hashing:

  str
  bytes
  int, float
  complex
  tuple
  frozenset

Aside from int and float, you cannot perform comparisons between objects of different types. Moreover, you cannot sort complex numbers at all.

I have crossed that bridge, and I'm telling you (again) that a sorted tuple is not a generic solution.

morshu9001 · 2025-12-11T20:14:41 1765484081

I'm not saying the problem with tuple doesn't exist, but that there doesn't need to be a built-in way to deal with it. If for some unfortunate reason you've got a mixed-type set that you also want to use as a dict key, you can write a helper.

ndr · 2025-12-11T14:59:00 1765465140

Immutability it's a joy to work with. Ask anyone who's worked with Clojure's dicts.

pansa2 · 2025-12-10T12:25:22 1765369522

> Jack Crenshaw's tutorial takes the syntax-directed translation approach, where code is emitted while parsing, without having to divide the compiler into explicit phases with IRs.

Is "syntax-directed translation" just another term for a single-pass compiler, e.g. as used by Lua (albeit to generate bytecode instead of assembly / machine code)? Or is it something more specific?

> in the latter parts of the tutorial it starts showing its limitations. Especially once we get to types [...] it's easy to generate working code; it's just not easy to generate optimal code

So, using a single-pass compiler for a statically-typed language makes it difficult to apply type-based optimizations. (Of course, Lua sidesteps this problem because the language is dynamically typed.)

Are there any other downsides? Does single-pass compilation also restrict the level of type checking that can be performed?

dist1ll · 2025-12-10T13:04:40 1765371880

As long as your target language has a strict define-before-use rule and no advanced inference is required you will know the types of expressions, and can perform type-based optimizations. You can also do constant folding and (very rudimentary) inlining. But the best optimizations are done on IRs, which you don't have access to in an old-school single pass design. LICM, CSE, GVN, DCE, and all the countless loop opts are not available to you. You'll also spill to memory a lot, because you can't run a decent regalloc in a single pass.

I'm actually a big fan a function-by-function dual-pass compilation. You generate IR from the parser in one pass, and do codegen right after. Most intermediate state is thrown out (including the AST, for non-polymorphic functions) and you move on to the next function. This give you an extremely fast data-oriented baseline compiler with reasonable codegen (much better than something like tcc).

pjmlp · 2025-12-10T12:57:53 1765371473

It is more specific, it means emiting code as you go along throught the source file.

A sigle pass compiler can still split the various phases, and only do the code generation on the last phase.

pansa2 · 2025-12-09T03:01:36 1765249296

> 90% of the horses in the US disappeared

Where did they go?

xwolfi · 2025-12-09T03:10:51 1765249851

they grew old and died ?

dsego · 2025-12-09T11:07:36 1765278456

There is a TV movie In Pursuit of Honor (1995) claiming to be based on true events. My short search online states that such things were never really documented, but it's plausible that there were similar things happening.

> In Pursuit of Honor is a 1995 American made-for-cable Western film directed by Ken Olin. Don Johnson stars as a member of a United States Cavalry detachment refusing to slaughter its horses after being ordered to do so by General Douglas MacArthur. The movie follows the plight of the officers as they attempt to save the animals that the Army no longer needs as it modernizes toward a mechanized military.

ekelsen · 2025-12-09T04:47:10 1765255630

sometimes not nearly so pleasant for them.

20after4 · 2025-12-09T11:14:23 1765278863

The glue factory.

pansa2 · 2025-12-05T09:22:15 1764926535

> Limitations […] No mutual recursion: Only direct self-recursion is optimized

srean · 2025-12-05T13:04:49 1764939889

Oh! slapping head. How did I miss that. Thanks

pansa2 · 2025-12-04T23:43:40 1764891820

In Python, it’s common to use exceptions for control flow. Even exiting a loop is done via an exception: `StopIteration`.

ivanyu · 2025-12-05T11:14:13 1764933253

It's not "common". You have to deal with StopIteration only when you write an iterator with the low-level API, which is maybe once in the career time for most of developers.

arccy · 2025-12-05T01:39:14 1764898754

isn't break more normal

zephen · 2025-12-05T03:11:01 1764904261

The point is that the use of exceptions is built into the language, so, for example, if you write "for something in somegeneratorfunction():" then somegeneratorfunction will signal to the for loop that it is finished by raising this exception.

pansa2 · 2025-12-05T02:53:19 1764903199

I’d say it’s more common for iterator-based loops to run to completion than to hit a `break` statement. The `StopIteration` exception is how the iterator signals that completion.

pansa2 · 2025-12-04T11:27:55 1764847675

> people should rather use EcmaScript name instead of JavaScript

Or go back to calling it “LiveScript”

DrScientist · 2025-12-04T12:31:27 1764851487

I'm not changing all the extensions on my files :-)

Just go with the flow - call it js.

pansa2 · 2025-12-02T12:38:47 1764679127

> Using `lea` […] is useful if both of the operands are still needed later on in other calculations (as it leaves them unchanged)

As well as making it possible to preserve the values of both operands, it’s also occasionally useful to use `lea` instead of `add` because it preserves the CPU flags.

andrepd · 2025-12-02T13:24:30 1764681870

Funny to see a comment on HN raising this exact point, when just ~2 hours ago I was writing inline asm that used `lea` precisely to preserve the carry flag before a jump table! :)

MYEUHD · 2025-12-02T14:19:55 1764685195

I'm curious, what are you working on that requires writing inline assembly?

veltas · 2025-12-02T14:37:55 1764686275

I'm not them but whenever I've used it it's been for arch specific features like adding a debug breakpoint, synchronization, using system registers, etc.

Never for performance. If I wanted to hand optimise code I'd be more likely to use SIMD intrinsics, play with C until the compiler does the right thing, or write the entire function in a separate asm file for better highlighting and easier handing of state at ABI boundary rather than mid-function like the carry flags mentioned above.

vlovich123 · 2025-12-02T18:01:18 1764698478

Generally inline assembly is much easier these days as a) the compiler can see into it and make optimizations b) you don’t have to worry about calling conventions

Someone · 2025-12-02T18:53:05 1764701585

> the compiler can see into it and make optimizations

Those writing assembler typically/often think/know they can do better than the compiler. That means that isn’t necessarily a good thing.

(Similarly, veltas comment above about “play with C until the compiler does the right thing” is brittle. You don’t even need to change compiler flags to make it suddenly not do the right thing anymore (on the other hand, when compiling for a different version of the CPU architecture, the compiler can fix things, too)

kragen · 2025-12-02T19:36:02 1764704162

It's rare that I see compiler-generated assembly without obvious drawbacks in it. You don't have to be an expert to spot them. But frequently the compiler also finds improvements I wouldn't have thought of. We're in the centaur-chess moment of compilers.

Generally playing with the C until the compiler does the right thing is slightly brittle in terms of performance but not in terms of functionality. Different compiler flags or a different architecture may give you worse performance, but the code will still work.

EdwardDiego · 2025-12-03T05:24:32 1764739472

Centaur-chess?

Someone · 2025-12-03T07:01:28 1764745288

https://en.wikipedia.org/wiki/Advanced_chess:

“Advanced chess is a form of chess in which each human player uses a computer chess engine to explore the possible results of candidate moves. With this computer assistance, the human player controls and decides the game.

Also called cyborg chess or centaur chess, advanced chess was introduced for the first time by grandmaster Garry Kasparov, with the aim of bringing together human and computer skills to achieve the following results:

- increasing the level of play to heights never before seen in chess;

- producing blunder-free games with the qualities and the beauty of both perfect tactical play and highly meaningful strategic plans;

- offering the public an overview of the mental processes of strong human chess players and powerful chess computers, and the combination of their forces.”

EdwardDiego · 2025-12-07T20:49:17 1765140557

Ah thank you!

andrepd · 2025-12-04T17:37:43 1764869863

Well I have benchmarks where my hand-written asm (on a fundamental inner function) beat the compiler-generated code by 3× :) Without SIMD (not applicable to what I was trying to solve).

And that was already after copious `assert_unchecked`s to have the compiler assume as many invariants as it could!

veltas · 2025-12-02T19:00:23 1764702023

> “play with C until the compiler does the right thing” is brittle

It's brittle depending on your methods. If you understand a little about optimizers and give the compiler the hints it needs to do the right things, then that should work with any modern compiler, and is more portable (and easier) than hand-optimizing in assembly straight away.

andrepd · 2025-12-04T17:35:57 1764869757

Well in my case I had to file an issue with the compiler (llvm) to fix the bad codegen. Credit to them, it was lightning fast and they merged a fix within days.

gcc optimised it correctly though.

vardump · 2025-12-02T22:55:27 1764716127

Of course you can often beat the compiler, humans still vectorize code better. And that interpreter/emulator switch-statement issue I mentioned in the other comment. There are probably a lot of other small niches.

In general case you're right. Modern compilers are beasts.

vardump · 2025-12-02T14:57:43 1764687463

Might be an interpreter or an emulator. That’s where you often want to preserve registers or flags and have jump tables.

This is one of the remaining cases where the current compilers optimize rather poorly: when you have a tight loop around a huge switch-statement, with each case-statement performing a very small operation on common data.

In that case, a human writing assembler can often beat a compiler with a huge margin.

pedrocr · 2025-12-02T16:48:29 1764694109

I'm curious if that's still the case generally after things like musttail attributes to help the compiler emit good assembly for well structured interpreter loops:

https://blog.reverberate.org/2025/02/10/tail-call-updates.ht...

andrepd · 2025-12-04T17:34:45 1764869685

https://github.com/andrepd/posit-rust

LLVM codegen has been almost always sufficient, but for a routine that essentially amounts to adding two fixed-size bigints (e.g. 1024-bit ints represented as `[u64; 16]`), codegen was very very bad.

Writing a jump table by hand literally made the code 3× faster :)

gishh · 2025-12-02T16:43:39 1764693819

I worked on a C codebase once, integrating an i2c sensor. The vendor only had example code in asm. I had to learn to inline asm.

It still happens in 2025

pansa2 · 2025-12-01T13:22:52 1764595372

> Unlike other partial register writes, when writing to an e register like eax, the architecture zeros the top 32 bits for free.

I’m familiar with 32-bit x86 assembly from writing it 10-20 years ago. So I was aware of the benefit of xor in general, but the above quote was new to me.

I don’t have any experience with 64-bit assembly - is there a guide anywhere that teaches 64-bit specifics like the above? Something like “x64 for those who know x86”?

sparkie · 2025-12-01T14:03:03 1764597783

It's not only xor that does this, but most 32-bit operations zero-extend the result of the 64-bit register. AMD did this for backward compatibility. so existing programs would mostly continue working, unlike Intel's earlier attempt at 64-bits which was an entirely new design.

The reason `xor eax,eax` is preferred to `xor rax,rax` is due to how the instructions are encoded - it saves one byte which in turn reduces instruction cache usage.

When using 64-bit operations, a REX prefix is required on the instruction (byte 0x40..0x4F), which serves two purposes - the MSB of the low nybble (W) being set (ie, REX prefixes 0x48..0x4f) indicates a 64-bit operation, and the low 3 bits of low nybble allow using registers r8-r15 by providing an extra bit for the ModRM register field and the base and index fields in the SIB byte, as only 3-bits (8-registers) are provided by x86.

A recent addition, APX, adds an additional 16 registers (r16-r31), which need 2 additional bits. There's a REX2 prefix for this (0xD5 ...), which is a two byte prefix to the instruction. REX2 replaces the REX prefix when accessing r16-r31, still contains the W bit, but it also includes an `M0` bit, which says which of the two main opcode maps to use, which replaces the 0x0F prefix, so it has no additional cost over the REX prefix when accessing the second opcode map.

cesarb · 2025-12-01T14:39:13 1764599953

> It's not only xor that does this, but most 32-bit operations zero-extend the result of the 64-bit register. AMD did this for backward compatibility.

It's not just that, zero-extending or sign-extending the result is also better for out-of-order implementations. If parts of the output register are preserved, the instruction needs an extra dependency on the original value.

ychen306 · 2025-12-01T18:26:56 1764613616

This. It's for renaming.

nickelpro · 2025-12-01T18:09:22 1764612562

Except for `xchg eax, eax`, which was the canonical nop on x86. Because it was supposed to do nothing, having it zero out the top 32-bits of rax would be quite surprising. So it doesn't.

Instead you need to use the multi-byte, general purpose encoding of `xchg` for `xchg eax, eax` to get the expected behavior.

veltas · 2025-12-01T13:34:47 1764596087

Chapter 3 of volume 1, ctrl+f for "64-bit mode", has a lot of the essentials including e.g. the stuff about zeroing out the top half of the register.

https://www.intel.com/content/www/us/en/developer/articles/t...

matt_d · 2025-12-01T13:44:09 1764596649

See https://github.com/MattPD/cpplinks/blob/master/assembly.x86.... - mostly focused on x86-64 (and some of the talks/tutorials offer pretty good overview)