[LLVMdev] Vectorizing alloca instructions

Thu Oct 24 15:00:31 PDT 2013

On Thu, Oct 24, 2013 at 2:04 PM, Tom Stellard <tom at stellard.net> wrote:

> Hi,
>
> I've been playing around with the SLPVectorizer trying to get it to
> vectorize this simple program:
>
> define void @vector(i32 addrspace(1)* %out, i32 %index) {
> entry:
>   %0 = alloca [4 x i32]
>   %x = getelementptr [4 x i32]* %0, i32 0, i32 0
>   %y = getelementptr [4 x i32]* %0, i32 0, i32 1
>   %z = getelementptr [4 x i32]* %0, i32 0, i32 2
>   %w = getelementptr [4 x i32]* %0, i32 0, i32 3
>   store i32 0, i32* %x
>   store i32 1, i32* %y
>   store i32 2, i32* %z
>   store i32 3, i32* %w
>   %1 = getelementptr [4 x i32]* %0, i32 0, i32 %index
>   %2 = load i32* %1
>   store i32 %2, i32 addrspace(1)* %out
>   ret void
> }
>
> My goal is to have this program transformed to the following:
>
> define void @vector(i32 addrspace(1)* %out, i32 %index) {
> entry:
>   %0 = extractelement <4 x i32> <i32 0, i32 1, i32 2, i32 3>, i32 %index
>   store i32 %0, i32 addrspace(1)* %out
> }
>
> I've slightly modified the SLPVectorizer

Just a note, I don't think you should or need to vectorize the actual
alloca stuff. If you can simply transform the dynamically indexed load:

define void @vector(i32 addrspace(1)* %out, i32 %index) {
entry:
  %0 = alloca [4 x i32]
  %x = getelementptr [4 x i32]* %0, i32 0, i32 0
  %y = getelementptr [4 x i32]* %0, i32 0, i32 1
  %z = getelementptr [4 x i32]* %0, i32 0, i32 2
  %w = getelementptr [4 x i32]* %0, i32 0, i32 3
  store i32 0, i32* %x
  store i32 1, i32* %y
  store i32 2, i32* %z
  store i32 3, i32* %w
  %1 = bitcast [4 x i32]* %0 to <4 x i32>*
  %2 = load <4 x i32>* %1
  %3 = extractelement <4 x i32> %2, i32 %index
  store i32 %3, i32 addrspace(1)* %out
  ret void
}

Then running SROA and InstCombine will mop up the rest. So its mostly about
getting the SLPVectorizer to handle the dynamic GEP. As soon as it does
that, everything else will fall away.

Not sure how much this helps, just wanted to point it out.

> (see the attached patch) so
> that it will vectorize small trees, and I've also fixed a crash in the
> BoUpSLP::Gather() function when it is passed a list of store
> instructions.  With this patch, the command:
>
> opt -slp-vectorizer -debug -march=r600 -mcpu=redwood -o - vector-alloca.ll
> -S -slp-threshold=-20
>
> Produces the following output and the program remains unchanged:
>
> ====
>
> SLP: Analyzing blocks in vector.
> SLP: Found 5 stores to vectorize.
> SLP: Analyzing a store chain of length 4.
> SLP: Analyzing a store chain of length 4
> SLP: Analyzing 4 stores at offset 0
> SLP: Checking users of    store i32 0, i32* %x.
> SLP: Checking users of    store i32 1, i32* %y.
> SLP: Checking users of    store i32 2, i32* %z.
> SLP: Checking users of    store i32 3, i32* %w.
> SLP: We are able to schedule this bundle.
> SLP: Can't sink   store i32 0, i32* %x
>  down to   store i32 3, i32* %w
>  because of   store i32 1, i32* %y.  Gathering.
> SLP: Calculating cost for tree of size 1.
> SLP: Check whether the tree with height 1 is fully vectorizable .
> SLP: Adding cost 4 for bundle that starts with   store i32 0, i32* %x .
> SLP: Total Cost 4.
> SLP: Found cost=4 for VF=4
> SLP: Decided to vectorize cost=4
> SLP: Extracting 0 values .
> SLP: Optimizing 0 gather sequences instructions.
> SLP: vectorized "vector"
>
> ====
>
> I'm having a little trouble figuring out why the stores do not end up
> being vectorized.  Does anyone have any insight into this?  Should this
> pass be able to perform the desired transformation?
>
> Thanks,
> Tom
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131024/8ab12e9b/attachment.html>