[LLVMdev] Vectorizing alloca instructions

Thu Oct 24 14:04:11 PDT 2013

Hi,

I've been playing around with the SLPVectorizer trying to get it to
vectorize this simple program:

define void @vector(i32 addrspace(1)* %out, i32 %index) {
entry:
  %0 = alloca [4 x i32]
  %x = getelementptr [4 x i32]* %0, i32 0, i32 0
  %y = getelementptr [4 x i32]* %0, i32 0, i32 1
  %z = getelementptr [4 x i32]* %0, i32 0, i32 2
  %w = getelementptr [4 x i32]* %0, i32 0, i32 3
  store i32 0, i32* %x
  store i32 1, i32* %y
  store i32 2, i32* %z
  store i32 3, i32* %w
  %1 = getelementptr [4 x i32]* %0, i32 0, i32 %index
  %2 = load i32* %1
  store i32 %2, i32 addrspace(1)* %out
  ret void
}

My goal is to have this program transformed to the following:

define void @vector(i32 addrspace(1)* %out, i32 %index) {
entry:
  %0 = extractelement <4 x i32> <i32 0, i32 1, i32 2, i32 3>, i32 %index
  store i32 %0, i32 addrspace(1)* %out
}

I've slightly modified the SLPVectorizer (see the attached patch) so
that it will vectorize small trees, and I've also fixed a crash in the
BoUpSLP::Gather() function when it is passed a list of store
instructions.  With this patch, the command:

opt -slp-vectorizer -debug -march=r600 -mcpu=redwood -o - vector-alloca.ll -S -slp-threshold=-20

Produces the following output and the program remains unchanged:

====

SLP: Analyzing blocks in vector.
SLP: Found 5 stores to vectorize.
SLP: Analyzing a store chain of length 4.
SLP: Analyzing a store chain of length 4
SLP: Analyzing 4 stores at offset 0
SLP: Checking users of    store i32 0, i32* %x. 
SLP: Checking users of    store i32 1, i32* %y. 
SLP: Checking users of    store i32 2, i32* %z. 
SLP: Checking users of    store i32 3, i32* %w. 
SLP: We are able to schedule this bundle.
SLP: Can't sink   store i32 0, i32* %x
 down to   store i32 3, i32* %w
 because of   store i32 1, i32* %y.  Gathering.
SLP: Calculating cost for tree of size 1.
SLP: Check whether the tree with height 1 is fully vectorizable .
SLP: Adding cost 4 for bundle that starts with   store i32 0, i32* %x .
SLP: Total Cost 4.
SLP: Found cost=4 for VF=4
SLP: Decided to vectorize cost=4
SLP: Extracting 0 values .
SLP: Optimizing 0 gather sequences instructions.
SLP: vectorized "vector"

====

I'm having a little trouble figuring out why the stores do not end up
being vectorized.  Does anyone have any insight into this?  Should this
pass be able to perform the desired transformation?

Thanks,
Tom

-------------- next part --------------
A non-text attachment was scrubbed...
Name: slp-vectorize-alloc.patch
Type: text/x-diff
Size: 1121 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131024/8accc7cc/attachment.patch>