[PATCH] Add basic support for removal of load that are fed by a store of an aggregate

Björn Steinbrink bsteinbr at gmail.com
Fri Jan 2 15:49:21 PST 2015


On 2015.01.02 15:43:11 -0800, Chandler Carruth wrote:
> Ok, how do i reproduce the problem you're seeing? For me:

Sorry again, after I had sent out the original email that was missing
the attachment, I had changed the store to a memcpy to see if that
triggers the correct optimization, and indeed it does, as you can see
below because I sent the version with the memcpy.

Third time's a charm, right?

Björn
> 
> % ./bin/opt -S -O2 < test-fca.ll
> ; ModuleID = '<stdin>'
> target datalayout = "e-i64:64-f80:128-n8:16:32:64-S128"
> target triple = "x86_64-unknown-linux-gnu"
> 
> declare void @foo()
> 
> define void @test({ i8, i8 }* nocapture) {
> brancher.exit:
>   %v.sroa.0.0.dst.sroa_idx = getelementptr inbounds { i8, i8 }* %0, i64 0,
> i32 0
>   store i8 1, i8* %v.sroa.0.0.dst.sroa_idx, align 1
>   %v.sroa.2.0.dst.sroa_idx = getelementptr inbounds { i8, i8 }* %0, i64 0,
> i32 1
>   store i8 2, i8* %v.sroa.2.0.dst.sroa_idx, align 1
>   tail call void @foo()
>   ret void
> }
> 
> On Fri, Jan 2, 2015 at 3:40 PM, Björn Steinbrink <bsteinbr at gmail.com> wrote:
> 
> > On 2015.01.02 23:29:00 +0100, Björn Steinbrink wrote:
> > > [Added llvm-commit back to Cc, sorry, removing it wasn't intentional]
> > >
> > > On 2015.01.02 14:05:28 -0800, Chandler Carruth wrote:
> > > > On Fri, Jan 2, 2015 at 2:02 PM, Björn Steinbrink <bsteinbr at gmail.com>
> > wrote:
> > > >
> > > > > 2015-01-02 22:36 GMT+01:00 Chandler Carruth <chandlerc at google.com>:
> > > > > >
> > > > > > On Fri, Jan 2, 2015 at 1:27 PM, Björn Steinbrink <
> > bsteinbr at gmail.com> wrote:
> > > > > >>
> > > > > >> If we have a simple load through a GEP that is fed by a store of
> > an
> > > > > >> aggregate, we can use the GEP indices to walk the stored
> > aggregate and
> > > > > >> extract the appropriate value to replace the load.
> > > > > >
> > > > > >
> > > > > > What's the motivation for this change?
> > > > >
> > > > > The rust compiler hit a case where not having this optimization
> > caused
> > > > > a branch not to be removed. The corresponding issue for rustc is
> > > > > https://github.com/rust-lang/rust/issues/20149
> > > > >
> > > > > > Note that SROA replaces all stores of aggregates with scalar
> > stores of
> > > > > > the components specifically so that neither it nor GVN needs to
> > > > > > cope with aggregate loads or stores.
> > > > >
> > > > > SROA did the opposite thing in this case, replacing individual stores
> > > > > insertvalues and a store of an FCA. I'm attaching the full output of
> > > > > opt -print-after-all -O2 -S for the failing test case given in the
> > > > > rust issue. The first SROA creates the FCA store, and later when the
> > > > > call to `unwrap` is inlined the existing optimizations can't
> > eliminate
> > > > > the branch that comes with it, because of the FCA store.
> > > >
> > > >
> > > > SROA doesn't create FCA stores in any cases I'm aware of... Can you
> > provide
> > > > a somewhat reduced test case rather than a massive log?
> > >
> > > I got myself confused because it introduced the insertvalue
> > > instructions. The FCA store was already present and not constructed by
> > > SROA. I've attach a small test case this time.
> >
> > And... I forgot it again :-/ I'm really sorry, apparently this is not my
> > day.
> >
> > Björn
> >
-------------- next part --------------
target datalayout = "e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

declare void @foo();
declare void @bar();

; Function Attrs: nounwind readnone uwtable
define void @test({ i8 , i8 }*) {
entry-block:
  %v = alloca { i8, i8 }
  %v.1 = getelementptr inbounds { i8, i8 }* %v, i64 0, i32 0
  store i8 1, i8 *%v.1
  %v.2 = getelementptr inbounds { i8, i8 }* %v, i64 0, i32 1
  store i8 2, i8 *%v.2
  %tmp = load { i8, i8 } *%v
  store { i8, i8 } %tmp, { i8, i8 }* %0
  call void @brancher({ i8, i8 }* %0)
  ret void
}

define internal void @brancher({ i8, i8 }*) {
  %v.1 = getelementptr inbounds { i8, i8 }* %0, i64 0, i32 0
  %v1 = load i8* %v.1
  %c = icmp eq i8 %v1, 1
  br i1 %c, label %if, label %else
if:
  call void @foo()
  br label %out

else:
  call void @bar()
  br label %out
out:
  ret void
}


More information about the llvm-commits mailing list