[clang] [clang][CodeGen] Zero init unspecified fields in initializers in C (PR #97121)

Eli Friedman via cfe-commits cfe-commits at lists.llvm.org
Tue Aug 27 13:43:06 PDT 2024


================
@@ -42,6 +42,66 @@ using namespace CodeGen;
 namespace {
 class ConstExprEmitter;
 
+llvm::Constant *getPadding(const CodeGenModule &CGM, CharUnits PadSize) {
+  if (!CGM.getLangOpts().CPlusPlus) {
+    // In C23 (N3096) $6.7.10:
+    // """
+    // If any object is initialized with an empty iniitializer, then it is
+    // subject to default initialization:
+    //  - if it is an aggregate, every member is initialized (recursively)
+    //  according to these rules, and any padding is initialized to zero bits;
+    //  - if it is a union, the first named member is initialized (recursively)
+    //  according to these rules, and any padding is initialized to zero bits.
+    //
+    // If the aggregate or union contains elements or members that are
+    // aggregates or unions, these rules apply recursively to the subaggregates
+    // or contained unions.
+    //
+    // If there are fewer initializers in a brace-enclosed list than there are
+    // elements or members of an aggregate, or fewer characters in a string
+    // literal used to initialize an array of known size than there are elements
+    // in the array, the remainder of the aggregate is subject to default
+    // initialization.
+    // """
+    //
+    // From my understanding, the standard is ambiguous in the following two
+    // areas:
+    // 1. For a union type with empty initializer, if the first named member is
+    // not the largest member, then the bytes comes after the first named member
+    // but before padding are left unspecified. An example is:
+    //    union U { int a; long long b;};
+    //    union U u = {};  // The first 4 bytes are 0, but 4-8 bytes are left
+    //    unspecified.
+    //
+    // 2. It only mentions padding for empty initializer, but doesn't mention
+    // padding for a non empty initialization list. And if the aggregation or
+    // union contains elements or members that are aggregates or unions, and
+    // some are non empty initializers, while others are empty initiailizers,
+    // the padding initialization is unclear. An example is:
+    //    struct S1 { int a; long long b; };
+    //    struct S2 { char c; struct S1 s1; };
+    //    // The values for paddings between s2.c and s2.s1.a, between s2.s1.a
+    //    and s2.s1.b are unclear.
+    //    struct S2 s2 = { 'c' };
+    //
+    // Here we choose to zero initiailize left bytes of a union type. Because
+    // projects like the Linux kernel are relying on this behavior. If we don't
+    // explicitly zero initialize them, the undef values can be optimized to
+    // return gabage data. We also choose to zero initialize paddings for
+    // aggregates and unions, no matter they are initialized by empty
+    // initializers or non empty initializers. This can provide a consistent
+    // behavior. So projects like the Linux kernel can rely on it.
+    if (PadSize == CharUnits::One())
+      return llvm::ConstantInt::get(CGM.CharTy, 0);
+    llvm::Type *Ty = llvm::ArrayType::get(CGM.CharTy, PadSize.getQuantity());
+    return llvm::ConstantAggregateZero::get(Ty);
----------------
efriedma-quic wrote:

Constant::getNullValue() to unify the codepaths a bit.

https://github.com/llvm/llvm-project/pull/97121


More information about the cfe-commits mailing list