> -ftrivial-auto-var-init=pattern Would be nice if this was zero instead of patt...

formerly_proven · on Sept 19, 2023

From the compiler's PoV this is buggy code so it's better to make it predictably wrong rather than unboundedly incorrect (=security issues) or predictably correct (=people rely on UB).

woodruffw · on Sept 19, 2023

On top of your reasons (which are good ones!), there’s another good reason to avoid default zero initialization in languages like C: zero is a special value for all kinds of sensitive operations (like UID 0 for root).

In other words: a mitigation that initializes all values to 0 may make some uses of uninitialized variables worse than they were before.

google234123 · on Sept 19, 2023

0 is also the most common variable value probably :p hard to tell a valid state from an invalid one

woodruffw · on Sept 19, 2023

Yes, that's why the "uninitialized" part is important; we're talking about a mitigation that would make UB potentially easier rather than harder to exploit.

Having 0 as a default initialization value in a language where doing so is well defined makes perfect sense; this is primarily an issue for C and C++ (to a lesser extent).

twic · on Sept 19, 2023

There is a proposal for C++ to zero-initialise automatic (ie local variables):

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p27...

If that goes through, zeroing automatics would just be doing the same thing.

(FYI the feedback section of that paper is quite funny)

tialaramex · on Sept 19, 2023

Ideally C++ would tell people you can't do that and make it a compiler error ("Ill-formed") but my guess is that too many people insist they ought to be able to take arbitrary C++ 23 code, recompile it with C++ 26 and have that just always work even though the standard doesn't deliver that and so it won't happen.

P2723 is unlikely to happen. The "Erroneous behaviour" P2795 might have a better chance. This would say it's wrong to do uninitialized reads (whereas P2723 says they are initialized to zero and thus it's not wrong) but you always get zero anyway.

I think there's a fair chance WG21 manages to make everybody unhappy by kicking this can into the long grass as they have on many other controversial issues.

Zero is the wrong default, it's better than UB, but it's not good. This is actually a problem in languages like Go where zero defaults are core to the language design. The correct thing is that "I didn't initialize it" won't compile. Force the programmer to write what they meant, sometimes they meant zero, or None, or 0.0 or whatever, but surprisingly often when confronted with the question the programmer realises their design is wrong and needs a design level change.

tsimionescu · on Sept 19, 2023

> Force the programmer to write what they meant, sometimes they meant zero, or None, or 0.0 or whatever, but surprisingly often when confronted with the question the programmer realises their design is wrong and needs a design level change.

I almost sympathize with your point, except we are taking about a language where `Type var;` is the explicit way to initialize many variables to a perfectly well defined value: it is the only way to call the no-arguments constructor for a variable on the stack. It's only for non-class types that this has the bizarre behavior of allocating but not setting any value.

It's even worse in a language with templates:

  template <class T>
  void foo() T {
    T local;
    return local;
  }

Can be perfectly correct OR it can be UB based on the type of T.

gpderetta · on Sept 19, 2023

> [...] it is the only way to call the no-arguments constructor for a variable on the stack

The syntax:

  T var{};

always value-initializes a stack allocated variable (or a member variable or a global).

tsimionescu · on Sept 19, 2023

I didn't know about the empty initializer list syntax.

Still, reading about it, there are cases where `T var{} ;` will do something different from `T var;`: if T is an aggregate object type, then it will invoke aggregate initialization instead of calling the no-args constructor.

gpderetta · on Sept 19, 2023

If T is an aggregate, there is, by definition, no constructor (no-argument or otherwise) to call. The aggregate initialization will then recursively value initialize each member, which is what you want.

The only catch is, as usual, list-initialiation. You have to hope that T is sane and any list initialization constructor with an empty list is equivalent to the nullary constructor.

tsimionescu · on Sept 19, 2023

There was one case I found on SO and later reproduced, where a class B derives publically from another class A which has a protected no-args constructor. In that case, B b; is valid, but B b{}; is not, since it tries to construct an instance of A from the calling code itself using the protected constructor, which it's obviously not allowed to access.

Overall I think it's safe to say that the two syntaxes have different semantics, even if they overlap in most cases.

gpderetta · on Sept 19, 2023

I have to try that. There are defect reports for corner cases.

In any case this is also allowed:

  T val = T();

And copy elision is now guaranteed.

tsimionescu · on Sept 20, 2023

The example looks like this:

  class A {
  protected:
    A() {}
  }

  class B : public A {}

  B x; //ok
  B y{}; //not ok, can't access A::A()

Would the T val = T(); example work if you don't have a copy constructor at all, or no move constructor, or custom ones which do weird things?

Edit: I checked, and you're right - the syntax seems to be fully equivalent in C++17 or later. Great to hear!

gpderetta · on Sept 20, 2023

Played it a little bit: this seems to be a regression and breaking change from C++14 where B would not be an aggregates, so B{} would just invoke the default constructor. This is probably an oversight, I wonder if there is a Defect Report.

edit: it works by making the inheritance protected, as B is no longer an aggregate. The right fix would be to also disqualify B from aggregate status if the base class constructor is unreachable.

Also making both A and B non-empty removes aggregate status, so it is really a dark corner of the language.

tialaramex · on Sept 19, 2023

> it is the only way to call the no-arguments constructor for a variable on the stack.

Is that really true? Ouch. In many languages that wouldn't feel crazy, but in a language where there's a whole book about initialization https://leanpub.com/cppinitbook that feels kinda silly.

tsimionescu · on Sept 19, 2023

A sibling response pointed out that adding an empty pair of braces (an ampty initializer list) after the var name can also invoke the no-args constructor, but it can also do other things depending on the class. So yes, I believe this is the only way of explicitly calling the no-args constructor whole in-place on a stack variable.

Ideally the syntax `T var();` would have worked as well, but it turns out that it would be ambiguous with declaring a local function named var that takes no arguments and returns a T...

xamuel · on Sept 19, 2023

>The correct thing is that "I didn't initialize it" won't compile

Flawless detection of uninitialized reads would require solving the halting problem, which is impossible. So requiring initialization does prevent optimal efficiency of some theoretical programs. Of course, this would only matter in cases where performance was extremely critical (and the whole point becomes moot if the alternative is to automatically zero the memory, which is even worse in this pedantic optimal-performance sense).

tialaramex · on Sept 19, 2023

Having some means (as these C++ proposals all do) to explicitly say "I understand that you can't see why this is correct but it assure you it is" would be fine, and needn't be introduced to beginners at all. The problem as usual in C++ is that All The Defaults Are Wrong and because they're defaults we need to warn beginners about them.

You won't write Rust's MaybeUninit<T>::assume_init() in your first program by mistake, whereas the equivalent mistake in C++ happens easily because it's the default.

tsimionescu · on Sept 20, 2023

The question essentially is what the statement `T x;` should mean. Today, if T is class, it means "allocate space for a value of type T and construct it using T::T()". However, if T is a built-in type, it means "allocate space for a value of type T with no defined value", which has proven to be highly problematic in practice.

The situation could be improved in two simple ways. One, you could unify the two meanings, and say that `T x;` allocates space and calls T::T() to initialize the value. The no-args constructor for built-in types already exists and initializes them to 0.

Or, you could also say `T x;` is illegal syntax, one must write `T x = val;` always (or at least when T is a built-in type).

In either case, an escape hatch is needed for allocating uninitialized space on the stack, since there are valid performance reasons for wanting that, in rare cases. But that should be new syntax, it really really shouldn't be the default. So you can still do something like `T x = std::uninitialized();` or whatever the syntax would be to get the current behavior in performance-critical cases, where the tradeoff makes sense.

Personally, especially given C++'s use of templates that don't distinguish between built-in types and classes, I believe the first option makes the most sense, and in fact removes am ugly inconsistency from the language.

saagarjha · on Sept 19, 2023

The proposal discusses the above concern (as it should, since the author has gotten almost every version of possible concerns). Perhaps one of them will win out and alter the proposal appropriately.

vbezhenar · on Sept 19, 2023

UB is a property of standard. GCC implements plenty of deviations from standard. Nothing wrong with that, as long as it's explicitly documented.

I'd even argue that defined behaviour is a subset of undefined behaviour. So I'd value compiler options to force well defined and "expected" behaviour instead of the current insanity.

Clang "optimized" away empty loop. My MCU gets locked because of it. I have to write `b .` with assembly, because C can't cut it. It is insanity.

saagarjha · on Sept 19, 2023

Optimizing out an empty loop in C is illegal.

vbezhenar · on Sept 19, 2023

https://godbolt.org/z/Ke7MvoEb9 clang does that

dzaima · on Sept 19, 2023

That's C++, which is not C. Granted, the C++ behavior is weird and annoying. (the C behavior, while better for truly-infinite loops, is still "broken" for potentially-not-but-still-possibly-infinite loops, though such should be less common)

vbezhenar · on Sept 19, 2023

Huh, didn't think about it, thanks.

Karellen · on Sept 19, 2023

That doesn't fit with my understanding of the C abstract machine. Can you give any links that explain this further? (Or to the relevant part of the standard itself?)

dzaima · on Sept 19, 2023

N1570, 6.8.5, point 6 under "Semantics":

    An iteration statement whose controlling expression is not a constant expression,156) that
    performs no input/output operations, does not access volatile objects, and performs no
    synchronization or atomic operations in its body, controlling expression, or (in the case of
    a for statement) its expression-3, may be assumed by the implementation to terminate.

Namely, the "not a constant expression" restriction is important here. So an empty loop with a non-constant end test can be assumed to terminate, but a constant one (e.g. while(1){} or for(;;); ) cannot.

Note that the rules in C++ on this are different, and do allow even a constant-end-condition empty loop to be assumed to terminate.

tialaramex · on Sept 19, 2023

Further bonus notes, the C++ behaviour is sufficiently controversial and disliked that there is a C++ 26 proposal to "fix" it: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p28...

And, Rust's only actual loop is an infinite loop. Rust's "loop" syntax is an infinite loop, and both "for" and "while" in Rust are just syntax sugar which the documentation explains how to transform your "for" or "while" into the exact same "loop" that it's going to emit when you do that - they're not merely "equivalent" that's how it really works via a process called "de-sugaring".

Interestingly "loop" is categorically more powerful than "for" or "while" because it has a type, the type of a "for" or "while" is always the unit type, but the type of a "loop" can be anything, for example maybe the loop finds a Goose, and the value of your loop is a Goose, this means to exit the loop we need a Goose and we can't leave the loop without one.

Because of the C++ misfeature, Rust has sometimes run into problems where LLVM is like "Oh, that's an infinite loop, I'll just ignore it" but LLVM is not a C++ compiler. Clang is a C++ compiler so Clang is allowed to obey C++ rules, but LLVM is not, it's supposed to provide an actual infinite loop, for both C and Rust to use.

gpderetta · on Sept 19, 2023

I understand thal LLVM implements the C++11 memory model, which specifies the termination requirements for non-side effects loops.

tialaramex · on Sept 19, 2023

It's true that C++ specifies this as part of its forward progress guarantees and that's likely how it infected LLVM, but I'd deny that LLVM's rather sparse documentation of their IR lowering says basically "Ooops, we actually are only suitable as the core of a C++ compiler" was part of their intent, especially since LLVM substantially pre-dates Clang...

Lattner started work on Clang in 2006, but LLVM is from 2000..

And sure enough when the Rust project finds bugs in LLVM related to this, there is no "Oh you can't have the semantics we documented, we actually provide exactly whatever C++ says instead for some reason". Sometimes it's a doc bug but most often the problem is that as usual the optimisation passes assumed something that's just not true outside of C++.

gpderetta · on Sept 19, 2023

I have no idea where's the formal spec for rust, but there's this: https://doc.rust-lang.org/nomicon/atomics.html

Edit: LLVM predates clang, but dragon egg was a thing.

In any case, before C++11 there was no memory model suitable for a system language[1], so it was the obvious solution.

[1] POSIX, OpenMP and the linux kernel all had memory models, but they were either underspecified, not sufficient or both.

tialaramex · on Sept 19, 2023

That's telling you that Rust has the C++ Memory Ordering rules, not that it has the C++ Forward Progress guarantee.

C likewise has the C++ Memory Ordering, but not its Forward Progress guarantee. As I wrote earlier, C has infinite loops, they're spelled the way you'd obviously write them in C or C++, but in C they're supposed to actually work (whereas in C++ they are UB). Rust is only different here syntactically, the semantic feature is identical to C's choice.

javier_e06 · on Sept 19, 2023

In my field zeroes are a problem when there is byte shift (not aligned). Specially data transfers. Alignment corruption cannot be detected when the memory area is all zeroes. We use things like 0xaaaa will have you.

_yvc3 · on Sept 19, 2023

I guess the reasoning behind this is that using a pattern (0xFE on GCC 12.2.0) is easier to recognise in a crash dump.