[llvm-dev] Dead store elimination in the backend for -ftrivial-auto-var-init

Vitaly Buka via llvm-dev llvm-dev at lists.llvm.org
Thu Aug 8 11:43:13 PDT 2019


I've updated the patch to produce this code:

  .text
  .file  "auto-var-init-dse.c"
  .globl  ipt_do_table            # -- Begin function ipt_do_table
  .p2align  4, 0x90
  .type  ipt_do_table, at function
ipt_do_table:                           # @ipt_do_table
  .cfi_startproc
# %bb.0:                                # %entry
  pushq  %rbx
  .cfi_def_cfa_offset 16
  subq  $40, %rsp
  .cfi_def_cfa_offset 56
  .cfi_offset %rbx, -16


*  movl  $0, 4(%rsp)  movl  $0, 33(%rsp)  movl  $0, 36(%rsp)*
  xorl  %eax, %eax
  callq  m
....

These movl can't be removed at it's paddings never initialized here, and we
don't know if q() will read them.
However these cases relatively rare, on Chromium I see less than 0.01%
binary size improvement.


On Tue, Aug 6, 2019 at 6:42 PM Vitaly Buka <vitalybuka at google.com> wrote:

> There are two problems:
> 1. padding after union and call to q(), without LTO we can't remove that
> store.
> 2. shortcut which I have which ignores all instructions q() . this assume
> that memset to acpar.match, acpar.matchinfo also useful which is not
> true. I should be able to improve this case.
>
> On Thu, Aug 1, 2019 at 11:29 PM Vitaly Buka <vitalybuka at google.com> wrote:
>
>> On a first look case like this should nor be a problem. The tail of the
>> memset here is unused because it's replaced immediately without reads . So
>> it can be trimmed. I will try the patch and let you know.
>>
>> On Thu, Aug 1, 2019 at 9:51 AM Alexander Potapenko <glider at google.com>
>> wrote:
>>
>>> On Thu, Aug 1, 2019 at 6:38 PM JF Bastien <jfbastien at apple.com> wrote:
>>> >
>>> >
>>> >
>>> > > On Aug 1, 2019, at 9:20 AM, Alexander Potapenko <glider at google.com>
>>> wrote:
>>> > >
>>> > > On Thu, Aug 1, 2019 at 6:09 PM JF Bastien <jfbastien at apple.com>
>>> wrote:
>>> > >>
>>> > >> Hi Alexander,
>>> > >>
>>> > >> The code doesn’t compile. Could you send a godbolt.org link that
>>> shows the issue?
>>> > > Sorry about that, here's the link: https://godbolt.org/z/-PinQP
>>> > >
>>> > > Lines 4 to 8 are initializing |acpar|.
>>> > > If I'm understanding correctly, the store to 8(%rsp) at line 7 can be
>>> > > removed because of the store at line 35.
>>>
>> >
>>> > These are in different basic blocks. IIRC there are a bunch of cases
>>> where DCE just fails when there’s any control flow. I think this is what
>>> you’re seeing here.
>>> I think Vitaly's patches (https://reviews.llvm.org/D61879) should
>>> allow DSE to work across basic blocks. But, as you mentioned, the
>>> problem is that we only have a memset at IR level.
>>> >
>>> > > Same for store to 24(%rsp), which is later overwritten at lines 11
>>> and 16.
>>> >
>>> > I’m not sure what we should do here… If it were a constant store we’d
>>> fold it in, but here both stores are variables so it’s not clear that we
>>> really should be doing much in MemCpyOptimizer.
>>> > The problem I see here is: there’s a small memset, followed by
>>> variable stores to the same location. DCE’ing requires realizing that the
>>> memset is small enough that the backend will expand it, at which point DCE
>>> will kick in. MemCpyOptimizer doesn’t know what “small” is, partly because
>>> it’s based on alignment and optimization level. Maybe it can be a target
>>> hook (“what would you do with this memset?”), that MemCpyOptimizer uses
>>> when it sees a memset followed by variable stores to the same location? We
>>> already fold in constant stores (replace the m() call by one)… but again
>>> this isn’t clearly a win: if you enable SSE you’ll get wide stores,
>>> followed by stores of smaller size. With SSE I’m not sure I’d want us to
>>> remove the MOVAPS instructions.
>>> Yes, this depends very much on the backend flags, therefore it might
>>> be too early to lower that memset before DSE.
>>> I was thinking about some kind of a peephole optimization, that
>>> deletes stores to the same memory location (at least for %rsp-based
>>> addresses) in the backend.
>>> I suspect AArch64 has something like this already, at least we don't
>>> see these redundant stores there.
>>>
>>>
>>> >
>>> >
>>> > >> Thanks,
>>> > >>
>>> > >> JF
>>> > >>
>>> > >>> On Aug 1, 2019, at 7:21 AM, Alexander Potapenko <glider at google.com>
>>> wrote:
>>> > >>>
>>> > >>> Hi folks,
>>> > >>>
>>> > >>> When compiling the attached example with
>>> -ftrivial-auto-var-init=zero:
>>> > >>>
>>> > >>> $ clang   -no-integrated-as  -mno-sse -m64 -mstack-alignment=8
>>> -O2
>>> > >>> -ftrivial-auto-var-init=zero
>>> > >>>
>>> -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang
>>> > >>> -g  -o ipt.ll -c ipt.i -w -S -emit-llvm
>>> > >>>
>>> > >>> , Clang generates an initialization memset() call for |acpar| in
>>> the IR:
>>> > >>>
>>> > >>>   %0 = bitcast %struct.xt_action_param* %acpar to i8*, !dbg !27
>>> > >>>   call void @llvm.memset.p0i8.i64(i8* nonnull align 8 %0, i8 0, i64
>>> > >>> 40, i1 false), !dbg !28
>>> > >>>
>>> > >>> Clang only splits memsets into series of stores under certain
>>> > >>> conditions, so it's hard to remove redundant stores on the IR level
>>> > >>> (even with the recent DSE patches: https://reviews.llvm.org/D61879
>>> ).
>>> > >>> In the resulting assembly, however, the memset is lowered into a
>>> > >>> series of MOVQ instructions (see the attached ipt.s file), of
>>> which at
>>> > >>> least the stores to 24(%rsp) and 16(%rsp) are redundant.
>>> > >>>
>>> > >>> Given that at MCInst level we already know if a memset is split
>>> into a
>>> > >>> sequence of stores, can we do a better job by making the backend
>>> > >>> delete these redundant stores for us?
>>> > >>>
>>> > >>> Alex
>>> > >>>
>>> > >>> --
>>> > >>> Alexander Potapenko
>>> > >>> Software Engineer
>>> > >>>
>>> > >>> Google Germany GmbH
>>> > >>> Erika-Mann-Straße, 33
>>> > >>> 80636 München
>>> > >>>
>>> > >>> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
>>> > >>> Registergericht und -nummer: Hamburg, HRB 86891
>>> > >>> Sitz der Gesellschaft: Hamburg
>>> > >>> <ipt.i><ipt.s>
>>> > >>
>>> > >
>>> > >
>>> > > --
>>> > > Alexander Potapenko
>>> > > Software Engineer
>>> > >
>>> > > Google Germany GmbH
>>> > > Erika-Mann-Straße, 33
>>> > > 80636 München
>>> > >
>>> > > Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
>>> > > Registergericht und -nummer: Hamburg, HRB 86891
>>> > > Sitz der Gesellschaft: Hamburg
>>> >
>>>
>>>
>>> --
>>> Alexander Potapenko
>>> Software Engineer
>>>
>>> Google Germany GmbH
>>> Erika-Mann-Straße, 33
>>> 80636 München
>>>
>>> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
>>> Registergericht und -nummer: Hamburg, HRB 86891
>>> Sitz der Gesellschaft: Hamburg
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190808/ac863242/attachment.html>


More information about the llvm-dev mailing list