[LLVMdev] Vectorizing alloca instructions
Nadav Rotem
nrotem at apple.com
Thu Oct 24 14:15:00 PDT 2013
Hi Tom,
Thanks for working on this. The SLP-vectorizer thinks that %X %Y %Z and %W alias, so it tries to perform 4 scalar store operations (which is a bad idea). We need to figure out why AA thinks that X and Y may alias. Maybe there is a problem with the code that uses AA.
Thanks,
Nadav
On Oct 24, 2013, at 2:04 PM, Tom Stellard <tom at stellard.net> wrote:
> Hi,
>
> I've been playing around with the SLPVectorizer trying to get it to
> vectorize this simple program:
>
> define void @vector(i32 addrspace(1)* %out, i32 %index) {
> entry:
> %0 = alloca [4 x i32]
> %x = getelementptr [4 x i32]* %0, i32 0, i32 0
> %y = getelementptr [4 x i32]* %0, i32 0, i32 1
> %z = getelementptr [4 x i32]* %0, i32 0, i32 2
> %w = getelementptr [4 x i32]* %0, i32 0, i32 3
> store i32 0, i32* %x
> store i32 1, i32* %y
> store i32 2, i32* %z
> store i32 3, i32* %w
> %1 = getelementptr [4 x i32]* %0, i32 0, i32 %index
> %2 = load i32* %1
> store i32 %2, i32 addrspace(1)* %out
> ret void
> }
>
> My goal is to have this program transformed to the following:
>
> define void @vector(i32 addrspace(1)* %out, i32 %index) {
> entry:
> %0 = extractelement <4 x i32> <i32 0, i32 1, i32 2, i32 3>, i32 %index
> store i32 %0, i32 addrspace(1)* %out
> }
>
> I've slightly modified the SLPVectorizer (see the attached patch) so
> that it will vectorize small trees, and I've also fixed a crash in the
> BoUpSLP::Gather() function when it is passed a list of store
> instructions. With this patch, the command:
>
> opt -slp-vectorizer -debug -march=r600 -mcpu=redwood -o - vector-alloca.ll -S -slp-threshold=-20
>
> Produces the following output and the program remains unchanged:
>
> ====
>
> SLP: Analyzing blocks in vector.
> SLP: Found 5 stores to vectorize.
> SLP: Analyzing a store chain of length 4.
> SLP: Analyzing a store chain of length 4
> SLP: Analyzing 4 stores at offset 0
> SLP: Checking users of store i32 0, i32* %x.
> SLP: Checking users of store i32 1, i32* %y.
> SLP: Checking users of store i32 2, i32* %z.
> SLP: Checking users of store i32 3, i32* %w.
> SLP: We are able to schedule this bundle.
> SLP: Can't sink store i32 0, i32* %x
> down to store i32 3, i32* %w
> because of store i32 1, i32* %y. Gathering.
> SLP: Calculating cost for tree of size 1.
> SLP: Check whether the tree with height 1 is fully vectorizable .
> SLP: Adding cost 4 for bundle that starts with store i32 0, i32* %x .
> SLP: Total Cost 4.
> SLP: Found cost=4 for VF=4
> SLP: Decided to vectorize cost=4
> SLP: Extracting 0 values .
> SLP: Optimizing 0 gather sequences instructions.
> SLP: vectorized "vector"
>
> ====
>
> I'm having a little trouble figuring out why the stores do not end up
> being vectorized. Does anyone have any insight into this? Should this
> pass be able to perform the desired transformation?
>
> Thanks,
> Tom
>
> <slp-vectorize-alloc.patch>_______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
More information about the llvm-dev
mailing list