[llvm-dev] Dead store elimination in the backend for -ftrivial-auto-var-init

Thu Aug 1 09:37:54 PDT 2019

> On Aug 1, 2019, at 9:20 AM, Alexander Potapenko <glider at google.com> wrote:
> 
> On Thu, Aug 1, 2019 at 6:09 PM JF Bastien <jfbastien at apple.com> wrote:
>> 
>> Hi Alexander,
>> 
>> The code doesn’t compile. Could you send a godbolt.org link that shows the issue?
> Sorry about that, here's the link: https://godbolt.org/z/-PinQP
> 
> Lines 4 to 8 are initializing |acpar|.
> If I'm understanding correctly, the store to 8(%rsp) at line 7 can be
> removed because of the store at line 35.

These are in different basic blocks. IIRC there are a bunch of cases where DCE just fails when there’s any control flow. I think this is what you’re seeing here.

> Same for store to 24(%rsp), which is later overwritten at lines 11 and 16.

I’m not sure what we should do here… If it were a constant store we’d fold it in, but here both stores are variables so it’s not clear that we really should be doing much in MemCpyOptimizer.
The problem I see here is: there’s a small memset, followed by variable stores to the same location. DCE’ing requires realizing that the memset is small enough that the backend will expand it, at which point DCE will kick in. MemCpyOptimizer doesn’t know what “small” is, partly because it’s based on alignment and optimization level. Maybe it can be a target hook (“what would you do with this memset?”), that MemCpyOptimizer uses when it sees a memset followed by variable stores to the same location? We already fold in constant stores (replace the m() call by one)… but again this isn’t clearly a win: if you enable SSE you’ll get wide stores, followed by stores of smaller size. With SSE I’m not sure I’d want us to remove the MOVAPS instructions.

>> Thanks,
>> 
>> JF
>> 
>>> On Aug 1, 2019, at 7:21 AM, Alexander Potapenko <glider at google.com> wrote:
>>> 
>>> Hi folks,
>>> 
>>> When compiling the attached example with -ftrivial-auto-var-init=zero:
>>> 
>>> $ clang   -no-integrated-as  -mno-sse -m64 -mstack-alignment=8    -O2
>>> -ftrivial-auto-var-init=zero
>>> -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang
>>> -g  -o ipt.ll -c ipt.i -w -S -emit-llvm
>>> 
>>> , Clang generates an initialization memset() call for |acpar| in the IR:
>>> 
>>>   %0 = bitcast %struct.xt_action_param* %acpar to i8*, !dbg !27
>>>   call void @llvm.memset.p0i8.i64(i8* nonnull align 8 %0, i8 0, i64
>>> 40, i1 false), !dbg !28
>>> 
>>> Clang only splits memsets into series of stores under certain
>>> conditions, so it's hard to remove redundant stores on the IR level
>>> (even with the recent DSE patches: https://reviews.llvm.org/D61879).
>>> In the resulting assembly, however, the memset is lowered into a
>>> series of MOVQ instructions (see the attached ipt.s file), of which at
>>> least the stores to 24(%rsp) and 16(%rsp) are redundant.
>>> 
>>> Given that at MCInst level we already know if a memset is split into a
>>> sequence of stores, can we do a better job by making the backend
>>> delete these redundant stores for us?
>>> 
>>> Alex
>>> 
>>> --
>>> Alexander Potapenko
>>> Software Engineer
>>> 
>>> Google Germany GmbH
>>> Erika-Mann-Straße, 33
>>> 80636 München
>>> 
>>> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
>>> Registergericht und -nummer: Hamburg, HRB 86891
>>> Sitz der Gesellschaft: Hamburg
>>> <ipt.i><ipt.s>
>> 
> 
> 
> -- 
> Alexander Potapenko
> Software Engineer
> 
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
> 
> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg