[llvm-dev] Dead store elimination in the backend for -ftrivial-auto-var-init

Vitaly Buka via llvm-dev llvm-dev at lists.llvm.org
Thu Aug 1 23:29:32 PDT 2019


On a first look case like this should nor be a problem. The tail of the
memset here is unused because it's replaced immediately without reads . So
it can be trimmed. I will try the patch and let you know.

On Thu, Aug 1, 2019 at 9:51 AM Alexander Potapenko <glider at google.com>
wrote:

> On Thu, Aug 1, 2019 at 6:38 PM JF Bastien <jfbastien at apple.com> wrote:
> >
> >
> >
> > > On Aug 1, 2019, at 9:20 AM, Alexander Potapenko <glider at google.com>
> wrote:
> > >
> > > On Thu, Aug 1, 2019 at 6:09 PM JF Bastien <jfbastien at apple.com> wrote:
> > >>
> > >> Hi Alexander,
> > >>
> > >> The code doesn’t compile. Could you send a godbolt.org link that
> shows the issue?
> > > Sorry about that, here's the link: https://godbolt.org/z/-PinQP
> > >
> > > Lines 4 to 8 are initializing |acpar|.
> > > If I'm understanding correctly, the store to 8(%rsp) at line 7 can be
> > > removed because of the store at line 35.
>
>
> > These are in different basic blocks. IIRC there are a bunch of cases
> where DCE just fails when there’s any control flow. I think this is what
> you’re seeing here.
> I think Vitaly's patches (https://reviews.llvm.org/D61879) should
> allow DSE to work across basic blocks. But, as you mentioned, the
> problem is that we only have a memset at IR level.
> >
> > > Same for store to 24(%rsp), which is later overwritten at lines 11 and
> 16.
> >
> > I’m not sure what we should do here… If it were a constant store we’d
> fold it in, but here both stores are variables so it’s not clear that we
> really should be doing much in MemCpyOptimizer.
> > The problem I see here is: there’s a small memset, followed by variable
> stores to the same location. DCE’ing requires realizing that the memset is
> small enough that the backend will expand it, at which point DCE will kick
> in. MemCpyOptimizer doesn’t know what “small” is, partly because it’s based
> on alignment and optimization level. Maybe it can be a target hook (“what
> would you do with this memset?”), that MemCpyOptimizer uses when it sees a
> memset followed by variable stores to the same location? We already fold in
> constant stores (replace the m() call by one)… but again this isn’t clearly
> a win: if you enable SSE you’ll get wide stores, followed by stores of
> smaller size. With SSE I’m not sure I’d want us to remove the MOVAPS
> instructions.
> Yes, this depends very much on the backend flags, therefore it might
> be too early to lower that memset before DSE.
> I was thinking about some kind of a peephole optimization, that
> deletes stores to the same memory location (at least for %rsp-based
> addresses) in the backend.
> I suspect AArch64 has something like this already, at least we don't
> see these redundant stores there.
>
>
> >
> >
> > >> Thanks,
> > >>
> > >> JF
> > >>
> > >>> On Aug 1, 2019, at 7:21 AM, Alexander Potapenko <glider at google.com>
> wrote:
> > >>>
> > >>> Hi folks,
> > >>>
> > >>> When compiling the attached example with
> -ftrivial-auto-var-init=zero:
> > >>>
> > >>> $ clang   -no-integrated-as  -mno-sse -m64 -mstack-alignment=8    -O2
> > >>> -ftrivial-auto-var-init=zero
> > >>>
> -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang
> > >>> -g  -o ipt.ll -c ipt.i -w -S -emit-llvm
> > >>>
> > >>> , Clang generates an initialization memset() call for |acpar| in the
> IR:
> > >>>
> > >>>   %0 = bitcast %struct.xt_action_param* %acpar to i8*, !dbg !27
> > >>>   call void @llvm.memset.p0i8.i64(i8* nonnull align 8 %0, i8 0, i64
> > >>> 40, i1 false), !dbg !28
> > >>>
> > >>> Clang only splits memsets into series of stores under certain
> > >>> conditions, so it's hard to remove redundant stores on the IR level
> > >>> (even with the recent DSE patches: https://reviews.llvm.org/D61879).
> > >>> In the resulting assembly, however, the memset is lowered into a
> > >>> series of MOVQ instructions (see the attached ipt.s file), of which
> at
> > >>> least the stores to 24(%rsp) and 16(%rsp) are redundant.
> > >>>
> > >>> Given that at MCInst level we already know if a memset is split into
> a
> > >>> sequence of stores, can we do a better job by making the backend
> > >>> delete these redundant stores for us?
> > >>>
> > >>> Alex
> > >>>
> > >>> --
> > >>> Alexander Potapenko
> > >>> Software Engineer
> > >>>
> > >>> Google Germany GmbH
> > >>> Erika-Mann-Straße, 33
> > >>> 80636 München
> > >>>
> > >>> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> > >>> Registergericht und -nummer: Hamburg, HRB 86891
> > >>> Sitz der Gesellschaft: Hamburg
> > >>> <ipt.i><ipt.s>
> > >>
> > >
> > >
> > > --
> > > Alexander Potapenko
> > > Software Engineer
> > >
> > > Google Germany GmbH
> > > Erika-Mann-Straße, 33
> > > 80636 München
> > >
> > > Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> > > Registergericht und -nummer: Hamburg, HRB 86891
> > > Sitz der Gesellschaft: Hamburg
> >
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190801/fb1332f2/attachment.html>


More information about the llvm-dev mailing list