[PATCH] Add basic support for removal of load that are fed by a store of an aggregate
Björn Steinbrink
bsteinbr at gmail.com
Fri Jan 2 15:49:21 PST 2015
On 2015.01.02 15:43:11 -0800, Chandler Carruth wrote:
> Ok, how do i reproduce the problem you're seeing? For me:
Sorry again, after I had sent out the original email that was missing
the attachment, I had changed the store to a memcpy to see if that
triggers the correct optimization, and indeed it does, as you can see
below because I sent the version with the memcpy.
Third time's a charm, right?
Björn
>
> % ./bin/opt -S -O2 < test-fca.ll
> ; ModuleID = '<stdin>'
> target datalayout = "e-i64:64-f80:128-n8:16:32:64-S128"
> target triple = "x86_64-unknown-linux-gnu"
>
> declare void @foo()
>
> define void @test({ i8, i8 }* nocapture) {
> brancher.exit:
> %v.sroa.0.0.dst.sroa_idx = getelementptr inbounds { i8, i8 }* %0, i64 0,
> i32 0
> store i8 1, i8* %v.sroa.0.0.dst.sroa_idx, align 1
> %v.sroa.2.0.dst.sroa_idx = getelementptr inbounds { i8, i8 }* %0, i64 0,
> i32 1
> store i8 2, i8* %v.sroa.2.0.dst.sroa_idx, align 1
> tail call void @foo()
> ret void
> }
>
> On Fri, Jan 2, 2015 at 3:40 PM, Björn Steinbrink <bsteinbr at gmail.com> wrote:
>
> > On 2015.01.02 23:29:00 +0100, Björn Steinbrink wrote:
> > > [Added llvm-commit back to Cc, sorry, removing it wasn't intentional]
> > >
> > > On 2015.01.02 14:05:28 -0800, Chandler Carruth wrote:
> > > > On Fri, Jan 2, 2015 at 2:02 PM, Björn Steinbrink <bsteinbr at gmail.com>
> > wrote:
> > > >
> > > > > 2015-01-02 22:36 GMT+01:00 Chandler Carruth <chandlerc at google.com>:
> > > > > >
> > > > > > On Fri, Jan 2, 2015 at 1:27 PM, Björn Steinbrink <
> > bsteinbr at gmail.com> wrote:
> > > > > >>
> > > > > >> If we have a simple load through a GEP that is fed by a store of
> > an
> > > > > >> aggregate, we can use the GEP indices to walk the stored
> > aggregate and
> > > > > >> extract the appropriate value to replace the load.
> > > > > >
> > > > > >
> > > > > > What's the motivation for this change?
> > > > >
> > > > > The rust compiler hit a case where not having this optimization
> > caused
> > > > > a branch not to be removed. The corresponding issue for rustc is
> > > > > https://github.com/rust-lang/rust/issues/20149
> > > > >
> > > > > > Note that SROA replaces all stores of aggregates with scalar
> > stores of
> > > > > > the components specifically so that neither it nor GVN needs to
> > > > > > cope with aggregate loads or stores.
> > > > >
> > > > > SROA did the opposite thing in this case, replacing individual stores
> > > > > insertvalues and a store of an FCA. I'm attaching the full output of
> > > > > opt -print-after-all -O2 -S for the failing test case given in the
> > > > > rust issue. The first SROA creates the FCA store, and later when the
> > > > > call to `unwrap` is inlined the existing optimizations can't
> > eliminate
> > > > > the branch that comes with it, because of the FCA store.
> > > >
> > > >
> > > > SROA doesn't create FCA stores in any cases I'm aware of... Can you
> > provide
> > > > a somewhat reduced test case rather than a massive log?
> > >
> > > I got myself confused because it introduced the insertvalue
> > > instructions. The FCA store was already present and not constructed by
> > > SROA. I've attach a small test case this time.
> >
> > And... I forgot it again :-/ I'm really sorry, apparently this is not my
> > day.
> >
> > Björn
> >
-------------- next part --------------
target datalayout = "e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
declare void @foo();
declare void @bar();
; Function Attrs: nounwind readnone uwtable
define void @test({ i8 , i8 }*) {
entry-block:
%v = alloca { i8, i8 }
%v.1 = getelementptr inbounds { i8, i8 }* %v, i64 0, i32 0
store i8 1, i8 *%v.1
%v.2 = getelementptr inbounds { i8, i8 }* %v, i64 0, i32 1
store i8 2, i8 *%v.2
%tmp = load { i8, i8 } *%v
store { i8, i8 } %tmp, { i8, i8 }* %0
call void @brancher({ i8, i8 }* %0)
ret void
}
define internal void @brancher({ i8, i8 }*) {
%v.1 = getelementptr inbounds { i8, i8 }* %0, i64 0, i32 0
%v1 = load i8* %v.1
%c = icmp eq i8 %v1, 1
br i1 %c, label %if, label %else
if:
call void @foo()
br label %out
else:
call void @bar()
br label %out
out:
ret void
}
More information about the llvm-commits
mailing list