[PATCH] D57898: CodeGen: Fix PR40605: split constant structures generated by -ftrivial-auto-var-init when emitting initializators

JF Bastien via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Feb 8 10:04:54 PST 2019

jfb added a comment.

Can you add a link to bug 40605 in the commit message?

> I'm not quite sure how to show the resulting difference in code. Do you mean we need Clang to support both modes and to compare the resulting assembly?

I only meant tests that show codegen, as you've now added auto-var-init.cpp. There might be interesting cases which I wasn't testing when I wrote it, so if you think of any please do make sure you add some. You mentioned a Linux struct that used to crash? That would be a useful test (I imagine it's one with the `Loc` adjustment).

Comment at: tools/clang/lib/CodeGen/CGDecl.cpp:1143
+    const llvm::StructLayout *Layout =
+        CGM.getDataLayout().getStructLayout(cast<llvm::StructType>(Ty));
+    for (unsigned i = 0; i != constant->getNumOperands(); i++) {
glider wrote:
> jfb wrote:
> > I think you need a heuristic here, where you don't emit multiple stores if `getNumOperands` is greater than some number. In that case you'd keep doing memcpy instead. Note that a struct containing sub-structs (or arrays) should also count the number of sub-structs (so just checking `getNumOperands` isn't sufficient).
> > 
> > Other parts of this code chose 6 stores...
> Do we ever need to not split these stores? Can't the  MemCpyOpt pass take care of them later on?
You're assuming the optimizer is running at all :-)
In general your approach seems to care about `-O2`, and I agree that's usually what we want to tune. However, clang also support `-O0` which won't touch these stores at all, and in `-O0` it's usually nice to just see a copy from a global while debugging. Further, configurations such as `-Os` and `-Oz` would be hampered by excessive codegen.

You also need to consider the compile-time hit the expansion you're making would add. More code means more time to compile, and I'd expect little to no win from emitting more than a handful of stores. If someone has a large thing on the stack, and it can be optimized, then we should teach the optimizer to deal with memcpy better. I don't think e.g. a 400+ byte struct makes sense to initialize with a bunch of store.

So I think you want to check what the size of the struct is (check out when x86-64 and ARM64 inline memcpy / memset, I assume those are decent guesses for size to inline).

Further, your current code sill do something silly for code like this:
struct S { int j; char arr[8]; float f; };
Before you'd have a single `memcpy`, but now it'll be a store, a `memset`, and a store. I think you really want to have the same heuristic for arrays (so you'd have just stores), and you want to make sure that if you're going to split up a struct you do so for all the fields inside that struct.

Comment at: tools/clang/lib/CodeGen/CGDecl.cpp:1765
+  emitStoresForConstant(CGM, D, Loc, isVolatile, Builder, constant,
+                        /*forInit*/ false);
glider wrote:
> jfb wrote:
> > Can you explain why this case needs to be different? IIUC it's because the types can differ which makes the struct's elements not match up the ones from the constant in the loop above? I think the code above should automatically handle that case instead of passing in a flag here.
> > 
> > You also definitely need to test the case where that happens :)
> I just thought we shouldn't break the constants that we didn't create with -ftrivial-var-auto-init.
> I'll take a closer look though (indeed, this crashes on one of the structs from the Linux kernel)
Yeah I think it's from the `Loc` adjustment above. I think your change is worth doing for all cases, not just trivial var auto-init. It'll lead us to optimize all of it much better IMO (and support things like `-O0` `-Os` and `-Oz` uniformly well too).

Comment at: tools/clang/test/CodeGenCXX/auto-var-init.cpp:37
 // PATTERN: @__const.test_small_custom.custom = private unnamed_addr constant %struct.small { i8 42 }, align 1
 // ZERO: @__const.test_small_custom.custom = private unnamed_addr constant %struct.small { i8 42 }, align 1
 struct small { char c; };
How come `@__const.test_small_custom.custom` still gets emitted? I guess custom initialization goes through a different code path?

Comment at: tools/clang/test/CodeGenCXX/auto-var-init.cpp:496
 // ZERO-LABEL: @test_smallpartinit_uninit()
 // ZERO: call void @llvm.memset{{.*}}, i8 0,
I wonder if (maybe in a separate patch) you also want to avoid `memset` when something is pretty small.

Comment at: tools/clang/test/CodeGenCXX/auto-var-init.cpp:664
+// PATTERN: store i32 -1431655766, {{.*}} align 4
+// PATTERN: call void @llvm.memset.{{.*}}(i8* align 4 {{.*}}, i8 0, i64 0, i1 false)
 // ZERO-LABEL: @test_arraytail_uninit()
This one is particularly silly since it's a zero-sized `memset` :)



More information about the cfe-commits mailing list