[PATCH] D68115: Zero initialize padding in unions

Fri Mar 27 13:10:11 PDT 2020

rsmith added a comment.

In D68115#1946668 <https://reviews.llvm.org/D68115#1946668>, @hubert.reinterpretcast wrote:

> It sounds like we are looking for `-fzero-union-padding`. That's been where the discussion has left off twice for months.

I believe the state of Clang prior to this patch is actually wrong. We reach this code regardless of the kind of initialization used for the union, and some forms of initialization require zeroing of padding whereas some do not. If we want to model this precisely, we'll need to track on `APValue` whether the padding bits of the union constant are zeroed or not. For example, given:

  union U { char c; int n; };
  U u = U(); // value-initialization
  U v = U{}; // aggregate initialization

Clang emits `{ i8 0, [3 x i8] undef }` for both `u` and `v`. That's correct for `v`, but incorrect for `u`: it should emit `{ i8 0, [3 x i8] zeroinitializer }` or perhaps simply `zeroinitializer`, because value-initialization invokes zero-initialization, which zeroes the padding.

Fixing this properly is likely not very hard, but it would require more changes than are present in this patch. (This patch is conservatively correct, but initializes more than we need to initialize.)

We also need to track in `APValue` whether padding bits are zeroed in order to correctly support `bit_cast` from structs with padding. Per discussion in committee, the intended behavior for `bit_cast` is:

> A bit in the value representation of the result is indeterminate if does not correspond to a bit in the value representation of from or corresponds to a bit of an object that is not within its lifetime or has an indeterminate value ([basic.indet]). For each bit in the value representation of the result that is indeterminate, the smallest object containing that bit has an indeterminate value; the behavior is undefined unless that object is of unsigned ordinary character type or std::byte type. The result does not otherwise contain any indeterminate values.

So in particular:

  struct A { char c; int n; };
  constexpr long n = bit_cast<long>(A()); // ok, 0
  constexpr long m = bit_cast<long>(A{}); // ill-formed, indeterminate value due to uninitialized padding between c and n

So I would propose we take the following path:

1. Extend Clang's constant evaluator and `APValue` to track, for a struct or union value, whether all padding bits are zeroed. (Should always be `true` for a value with no padding bits.)
2. Land this patch behind a flag to zero all padding bits for unions, ideally extended to cover struct padding as well as union padding.
3. After doing (1), extend `__builtin_bit_cast` support to properly handle padding bits.
4. After doing (1) and (2), extend constant aggregate emission to always zero padding when required by the language standard. (If you want, make the flag be three-way: never zero, zero as required by language standard, always zero, maybe: `-fzero-padding=never` / `-fzero-padding=std`, `-fzero-padding=always`.)

Note that (1) and (2) are independent, so I don't think we need to block this patch (2) on the implementation of (1), but we should be aware that we're not done here until we do steps (3) and (4).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68115/new/

https://reviews.llvm.org/D68115