[LLVMdev] Vectorizing alloca instructions
Tom Stellard
tom at stellard.net
Thu Oct 24 16:08:05 PDT 2013
On Thu, Oct 24, 2013 at 02:15:00PM -0700, Nadav Rotem wrote:
> Hi Tom,
>
> Thanks for working on this. The SLP-vectorizer thinks that %X %Y %Z and %W alias, so it tries to perform 4 scalar store operations (which is a bad idea). We need to figure out why AA thinks that X and Y may alias. Maybe there is a problem with the code that uses AA.
>
Thanks for the tip. opt was using NoAliasAnalysis by default, so passing
-basicaa to opt gives me this now:
SLP: Analyzing blocks in vector.
SLP: Found 5 stores to vectorize.
SLP: Analyzing a store chain of length 4.
SLP: Analyzing a store chain of length 4
SLP: Analyzing 4 stores at offset 0
SLP: Checking users of store i32 0, i32* %x.
SLP: Checking users of store i32 1, i32* %y.
SLP: Checking users of store i32 2, i32* %z.
SLP: Checking users of store i32 3, i32* %w.
SLP: We are able to schedule this bundle.
SLP: added a vector of stores.
SLP: Gathering due to C,S,B,O.
SLP: Calculating cost for tree of size 2.
SLP: Check whether the tree with height 2 is fully vectorizable .
SLP: Adding cost -3 for bundle that starts with store i32 0, i32* %x .
SLP: Adding cost 0 for bundle that starts with i32 0 .
SLP: Total Cost -3.
SLP: Found cost=-3 for VF=4
SLP: Decided to vectorize cost=-3
SLP: Extracting 0 values .
SLP: Erasing scalar: store i32 0, i32* %x.
SLP: Erasing scalar: store i32 1, i32* %y.
SLP: Erasing scalar: store i32 2, i32* %z.
SLP: Erasing scalar: store i32 3, i32* %w.
SLP: Optimizing 0 gather sequences instructions.
SLP: vectorized "vector"
; ModuleID = 'vector-alloca.ll'
target datalayout =
"e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-v2048:2048:2048-n32:64"
target triple = "r600--"
define void @vector(i32 addrspace(1)* %out, i32 %index) {
entry:
%0 = alloca [4 x i32]
%x = getelementptr [4 x i32]* %0, i32 0, i32 0
%y = getelementptr [4 x i32]* %0, i32 0, i32 1
%z = getelementptr [4 x i32]* %0, i32 0, i32 2
%w = getelementptr [4 x i32]* %0, i32 0, i32 3
%1 = bitcast i32* %x to <4 x i32>*
store <4 x i32> <i32 0, i32 1, i32 2, i32 3>, <4 x i32>* %1
%2 = getelementptr [4 x i32]* %0, i32 0, i32 %index
%3 = load i32* %2
store i32 %3, i32 addrspace(1)* %out
ret void
}
The next step for me is to figure out how to replace the GEP + load with an
extractelement.
-Tom
> Thanks,
> Nadav
>
>
> On Oct 24, 2013, at 2:04 PM, Tom Stellard <tom at stellard.net> wrote:
>
> > Hi,
> >
> > I've been playing around with the SLPVectorizer trying to get it to
> > vectorize this simple program:
> >
> > define void @vector(i32 addrspace(1)* %out, i32 %index) {
> > entry:
> > %0 = alloca [4 x i32]
> > %x = getelementptr [4 x i32]* %0, i32 0, i32 0
> > %y = getelementptr [4 x i32]* %0, i32 0, i32 1
> > %z = getelementptr [4 x i32]* %0, i32 0, i32 2
> > %w = getelementptr [4 x i32]* %0, i32 0, i32 3
> > store i32 0, i32* %x
> > store i32 1, i32* %y
> > store i32 2, i32* %z
> > store i32 3, i32* %w
> > %1 = getelementptr [4 x i32]* %0, i32 0, i32 %index
> > %2 = load i32* %1
> > store i32 %2, i32 addrspace(1)* %out
> > ret void
> > }
> >
> > My goal is to have this program transformed to the following:
> >
> > define void @vector(i32 addrspace(1)* %out, i32 %index) {
> > entry:
> > %0 = extractelement <4 x i32> <i32 0, i32 1, i32 2, i32 3>, i32 %index
> > store i32 %0, i32 addrspace(1)* %out
> > }
> >
> > I've slightly modified the SLPVectorizer (see the attached patch) so
> > that it will vectorize small trees, and I've also fixed a crash in the
> > BoUpSLP::Gather() function when it is passed a list of store
> > instructions. With this patch, the command:
> >
> > opt -slp-vectorizer -debug -march=r600 -mcpu=redwood -o - vector-alloca.ll -S -slp-threshold=-20
> >
> > Produces the following output and the program remains unchanged:
> >
> > ====
> >
> > SLP: Analyzing blocks in vector.
> > SLP: Found 5 stores to vectorize.
> > SLP: Analyzing a store chain of length 4.
> > SLP: Analyzing a store chain of length 4
> > SLP: Analyzing 4 stores at offset 0
> > SLP: Checking users of store i32 0, i32* %x.
> > SLP: Checking users of store i32 1, i32* %y.
> > SLP: Checking users of store i32 2, i32* %z.
> > SLP: Checking users of store i32 3, i32* %w.
> > SLP: We are able to schedule this bundle.
> > SLP: Can't sink store i32 0, i32* %x
> > down to store i32 3, i32* %w
> > because of store i32 1, i32* %y. Gathering.
> > SLP: Calculating cost for tree of size 1.
> > SLP: Check whether the tree with height 1 is fully vectorizable .
> > SLP: Adding cost 4 for bundle that starts with store i32 0, i32* %x .
> > SLP: Total Cost 4.
> > SLP: Found cost=4 for VF=4
> > SLP: Decided to vectorize cost=4
> > SLP: Extracting 0 values .
> > SLP: Optimizing 0 gather sequences instructions.
> > SLP: vectorized "vector"
> >
> > ====
> >
> > I'm having a little trouble figuring out why the stores do not end up
> > being vectorized. Does anyone have any insight into this? Should this
> > pass be able to perform the desired transformation?
> >
> > Thanks,
> > Tom
> >
> > <slp-vectorize-alloc.patch>_______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
More information about the llvm-dev
mailing list