[llvm-dev] Dead store elimination in the backend for -ftrivial-auto-var-init

Vitaly Buka via llvm-dev llvm-dev at lists.llvm.org
Tue Aug 6 18:42:56 PDT 2019


There are two problems:
1. padding after union and call to q(), without LTO we can't remove that
store.
2. shortcut which I have which ignores all instructions q() . this assume
that memset to acpar.match, acpar.matchinfo also useful which is not true. I
should be able to improve this case.

On Thu, Aug 1, 2019 at 11:29 PM Vitaly Buka <vitalybuka at google.com> wrote:

> On a first look case like this should nor be a problem. The tail of the
> memset here is unused because it's replaced immediately without reads . So
> it can be trimmed. I will try the patch and let you know.
>
> On Thu, Aug 1, 2019 at 9:51 AM Alexander Potapenko <glider at google.com>
> wrote:
>
>> On Thu, Aug 1, 2019 at 6:38 PM JF Bastien <jfbastien at apple.com> wrote:
>> >
>> >
>> >
>> > > On Aug 1, 2019, at 9:20 AM, Alexander Potapenko <glider at google.com>
>> wrote:
>> > >
>> > > On Thu, Aug 1, 2019 at 6:09 PM JF Bastien <jfbastien at apple.com>
>> wrote:
>> > >>
>> > >> Hi Alexander,
>> > >>
>> > >> The code doesn’t compile. Could you send a godbolt.org link that
>> shows the issue?
>> > > Sorry about that, here's the link: https://godbolt.org/z/-PinQP
>> > >
>> > > Lines 4 to 8 are initializing |acpar|.
>> > > If I'm understanding correctly, the store to 8(%rsp) at line 7 can be
>> > > removed because of the store at line 35.
>>
> >
>> > These are in different basic blocks. IIRC there are a bunch of cases
>> where DCE just fails when there’s any control flow. I think this is what
>> you’re seeing here.
>> I think Vitaly's patches (https://reviews.llvm.org/D61879) should
>> allow DSE to work across basic blocks. But, as you mentioned, the
>> problem is that we only have a memset at IR level.
>> >
>> > > Same for store to 24(%rsp), which is later overwritten at lines 11
>> and 16.
>> >
>> > I’m not sure what we should do here… If it were a constant store we’d
>> fold it in, but here both stores are variables so it’s not clear that we
>> really should be doing much in MemCpyOptimizer.
>> > The problem I see here is: there’s a small memset, followed by variable
>> stores to the same location. DCE’ing requires realizing that the memset is
>> small enough that the backend will expand it, at which point DCE will kick
>> in. MemCpyOptimizer doesn’t know what “small” is, partly because it’s based
>> on alignment and optimization level. Maybe it can be a target hook (“what
>> would you do with this memset?”), that MemCpyOptimizer uses when it sees a
>> memset followed by variable stores to the same location? We already fold in
>> constant stores (replace the m() call by one)… but again this isn’t clearly
>> a win: if you enable SSE you’ll get wide stores, followed by stores of
>> smaller size. With SSE I’m not sure I’d want us to remove the MOVAPS
>> instructions.
>> Yes, this depends very much on the backend flags, therefore it might
>> be too early to lower that memset before DSE.
>> I was thinking about some kind of a peephole optimization, that
>> deletes stores to the same memory location (at least for %rsp-based
>> addresses) in the backend.
>> I suspect AArch64 has something like this already, at least we don't
>> see these redundant stores there.
>>
>>
>> >
>> >
>> > >> Thanks,
>> > >>
>> > >> JF
>> > >>
>> > >>> On Aug 1, 2019, at 7:21 AM, Alexander Potapenko <glider at google.com>
>> wrote:
>> > >>>
>> > >>> Hi folks,
>> > >>>
>> > >>> When compiling the attached example with
>> -ftrivial-auto-var-init=zero:
>> > >>>
>> > >>> $ clang   -no-integrated-as  -mno-sse -m64 -mstack-alignment=8
>> -O2
>> > >>> -ftrivial-auto-var-init=zero
>> > >>>
>> -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang
>> > >>> -g  -o ipt.ll -c ipt.i -w -S -emit-llvm
>> > >>>
>> > >>> , Clang generates an initialization memset() call for |acpar| in
>> the IR:
>> > >>>
>> > >>>   %0 = bitcast %struct.xt_action_param* %acpar to i8*, !dbg !27
>> > >>>   call void @llvm.memset.p0i8.i64(i8* nonnull align 8 %0, i8 0, i64
>> > >>> 40, i1 false), !dbg !28
>> > >>>
>> > >>> Clang only splits memsets into series of stores under certain
>> > >>> conditions, so it's hard to remove redundant stores on the IR level
>> > >>> (even with the recent DSE patches: https://reviews.llvm.org/D61879
>> ).
>> > >>> In the resulting assembly, however, the memset is lowered into a
>> > >>> series of MOVQ instructions (see the attached ipt.s file), of which
>> at
>> > >>> least the stores to 24(%rsp) and 16(%rsp) are redundant.
>> > >>>
>> > >>> Given that at MCInst level we already know if a memset is split
>> into a
>> > >>> sequence of stores, can we do a better job by making the backend
>> > >>> delete these redundant stores for us?
>> > >>>
>> > >>> Alex
>> > >>>
>> > >>> --
>> > >>> Alexander Potapenko
>> > >>> Software Engineer
>> > >>>
>> > >>> Google Germany GmbH
>> > >>> Erika-Mann-Straße, 33
>> > >>> 80636 München
>> > >>>
>> > >>> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
>> > >>> Registergericht und -nummer: Hamburg, HRB 86891
>> > >>> Sitz der Gesellschaft: Hamburg
>> > >>> <ipt.i><ipt.s>
>> > >>
>> > >
>> > >
>> > > --
>> > > Alexander Potapenko
>> > > Software Engineer
>> > >
>> > > Google Germany GmbH
>> > > Erika-Mann-Straße, 33
>> > > 80636 München
>> > >
>> > > Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
>> > > Registergericht und -nummer: Hamburg, HRB 86891
>> > > Sitz der Gesellschaft: Hamburg
>> >
>>
>>
>> --
>> Alexander Potapenko
>> Software Engineer
>>
>> Google Germany GmbH
>> Erika-Mann-Straße, 33
>> 80636 München
>>
>> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
>> Registergericht und -nummer: Hamburg, HRB 86891
>> Sitz der Gesellschaft: Hamburg
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190806/50b20285/attachment.html>


More information about the llvm-dev mailing list