[LLVMdev] Vectorizing alloca instructions

Thu Oct 24 14:15:00 PDT 2013

Hi Tom, 

Thanks for working on this.  The SLP-vectorizer thinks that %X %Y %Z and %W alias, so it tries to perform 4 scalar store operations (which is a bad idea).  We need to figure out why AA thinks that X and Y may alias.  Maybe there is a problem with the code that uses AA. 

Thanks,
Nadav

On Oct 24, 2013, at 2:04 PM, Tom Stellard <tom at stellard.net> wrote:

> Hi,
> 
> I've been playing around with the SLPVectorizer trying to get it to
> vectorize this simple program:
> 
> define void @vector(i32 addrspace(1)* %out, i32 %index) {
> entry:
>  %0 = alloca [4 x i32]
>  %x = getelementptr [4 x i32]* %0, i32 0, i32 0
>  %y = getelementptr [4 x i32]* %0, i32 0, i32 1
>  %z = getelementptr [4 x i32]* %0, i32 0, i32 2
>  %w = getelementptr [4 x i32]* %0, i32 0, i32 3
>  store i32 0, i32* %x
>  store i32 1, i32* %y
>  store i32 2, i32* %z
>  store i32 3, i32* %w
>  %1 = getelementptr [4 x i32]* %0, i32 0, i32 %index
>  %2 = load i32* %1
>  store i32 %2, i32 addrspace(1)* %out
>  ret void
> }
> 
> My goal is to have this program transformed to the following:
> 
> define void @vector(i32 addrspace(1)* %out, i32 %index) {
> entry:
>  %0 = extractelement <4 x i32> <i32 0, i32 1, i32 2, i32 3>, i32 %index
>  store i32 %0, i32 addrspace(1)* %out
> }
> 
> I've slightly modified the SLPVectorizer (see the attached patch) so
> that it will vectorize small trees, and I've also fixed a crash in the
> BoUpSLP::Gather() function when it is passed a list of store
> instructions.  With this patch, the command:
> 
> opt -slp-vectorizer -debug -march=r600 -mcpu=redwood -o - vector-alloca.ll -S -slp-threshold=-20
> 
> Produces the following output and the program remains unchanged:
> 
> ====
> 
> SLP: Analyzing blocks in vector.
> SLP: Found 5 stores to vectorize.
> SLP: Analyzing a store chain of length 4.
> SLP: Analyzing a store chain of length 4
> SLP: Analyzing 4 stores at offset 0
> SLP: Checking users of    store i32 0, i32* %x. 
> SLP: Checking users of    store i32 1, i32* %y. 
> SLP: Checking users of    store i32 2, i32* %z. 
> SLP: Checking users of    store i32 3, i32* %w. 
> SLP: We are able to schedule this bundle.
> SLP: Can't sink   store i32 0, i32* %x
> down to   store i32 3, i32* %w
> because of   store i32 1, i32* %y.  Gathering.
> SLP: Calculating cost for tree of size 1.
> SLP: Check whether the tree with height 1 is fully vectorizable .
> SLP: Adding cost 4 for bundle that starts with   store i32 0, i32* %x .
> SLP: Total Cost 4.
> SLP: Found cost=4 for VF=4
> SLP: Decided to vectorize cost=4
> SLP: Extracting 0 values .
> SLP: Optimizing 0 gather sequences instructions.
> SLP: vectorized "vector"
> 
> ====
> 
> I'm having a little trouble figuring out why the stores do not end up
> being vectorized.  Does anyone have any insight into this?  Should this
> pass be able to perform the desired transformation?
> 
> Thanks,
> Tom
> 
> <slp-vectorize-alloc.patch>_______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev