[clang] [clang][CodeGen] Zero init unspecified fields in initializers in C (PR #97121)

Thu Aug 1 16:50:09 PDT 2024

================
@@ -361,6 +368,13 @@ CodeGenFunction::AddInitializerToStaticVarDecl(const VarDecl &D,
     }
     return GV;
   }
+  if (!getLangOpts().CPlusPlus) {
+    // In C, when an initializer is given, the Linux kernel relies on clang to
+    // zero-initialize all members not explicitly initialized, including padding
+    // bits.
----------------
yabinc wrote:

> But I assume that similar code is required in order to explicitly zero out any padding when dynamically initializing a local variable.

I guess you mean variables assigned by Compound literals after variable definition. An example is in https://godbolt.org/z/caaEKnjWe. From the ast dump, it is treated as an assignment from InitListExpr. The source code seems to be in AggExprEmitter::VisitCXXParenListOrInitListExpr() in https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/CGExprAgg.cpp#L1757. Each field of the InitListExpr is stored to the variable separately. One solution is to always do memset in CheckAggExprForMemSetUse() in https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/CGExprAgg.cpp#L1974. The memset can be optimized away later if not needed. 
Another place is compound literals for global values. It seems handled in CGM.GetAddrOfConstantCompoundLiteral().
To not introduce a big change at once, Shall I handle them in a follow up patch?

> Also, is adding constant padding fields retroactively really the best way to handle this instead of just expecting the constant emitter to fill them in itself?

Currently I limit the change to C and reuse code for trivial auto var zero init. So it's easier to do it outside the constant emitter. If the change can be extended to C++, I feel it's better to always do it in the constant emitter. 

> Comments in the compiler generally shouldn't refer to specific projects. We are making a guarantee based on our interpretation of the standard.

The standard is ambiguous on this, so can't be used as a reason for the change. Please let me know if you have better idea about how to reword the comment.

> Similarly, the PR title is also inappropriate. It is fine to note in the PR description that Linux relies on this for correctness, but the PR title should describe the semantic change.

Changed the PR title back to the commit title, which describes the semantic change.

> Also, I think we should do this unconditionally in all language modes. It's not like this is forbidden in C++.

I agree. But when I was trying to do it before, I need to fix a lot more tests. And the Linux kernel only has C code. I don't have strong reason to push this to C++. So I want to limit the change to C at first. If reviewers think we should also do it in C++, shall I do it in a follow up change?


https://github.com/llvm/llvm-project/pull/97121