[LLVMdev] Vectorizing alloca instructions

Tom Stellard tom at stellard.net
Thu Oct 24 14:04:11 PDT 2013


I've been playing around with the SLPVectorizer trying to get it to
vectorize this simple program:

define void @vector(i32 addrspace(1)* %out, i32 %index) {
  %0 = alloca [4 x i32]
  %x = getelementptr [4 x i32]* %0, i32 0, i32 0
  %y = getelementptr [4 x i32]* %0, i32 0, i32 1
  %z = getelementptr [4 x i32]* %0, i32 0, i32 2
  %w = getelementptr [4 x i32]* %0, i32 0, i32 3
  store i32 0, i32* %x
  store i32 1, i32* %y
  store i32 2, i32* %z
  store i32 3, i32* %w
  %1 = getelementptr [4 x i32]* %0, i32 0, i32 %index
  %2 = load i32* %1
  store i32 %2, i32 addrspace(1)* %out
  ret void

My goal is to have this program transformed to the following:

define void @vector(i32 addrspace(1)* %out, i32 %index) {
  %0 = extractelement <4 x i32> <i32 0, i32 1, i32 2, i32 3>, i32 %index
  store i32 %0, i32 addrspace(1)* %out

I've slightly modified the SLPVectorizer (see the attached patch) so
that it will vectorize small trees, and I've also fixed a crash in the
BoUpSLP::Gather() function when it is passed a list of store
instructions.  With this patch, the command:

opt -slp-vectorizer -debug -march=r600 -mcpu=redwood -o - vector-alloca.ll -S -slp-threshold=-20

Produces the following output and the program remains unchanged:


SLP: Analyzing blocks in vector.
SLP: Found 5 stores to vectorize.
SLP: Analyzing a store chain of length 4.
SLP: Analyzing a store chain of length 4
SLP: Analyzing 4 stores at offset 0
SLP: Checking users of    store i32 0, i32* %x. 
SLP: Checking users of    store i32 1, i32* %y. 
SLP: Checking users of    store i32 2, i32* %z. 
SLP: Checking users of    store i32 3, i32* %w. 
SLP: We are able to schedule this bundle.
SLP: Can't sink   store i32 0, i32* %x
 down to   store i32 3, i32* %w
 because of   store i32 1, i32* %y.  Gathering.
SLP: Calculating cost for tree of size 1.
SLP: Check whether the tree with height 1 is fully vectorizable .
SLP: Adding cost 4 for bundle that starts with   store i32 0, i32* %x .
SLP: Total Cost 4.
SLP: Found cost=4 for VF=4
SLP: Decided to vectorize cost=4
SLP: Extracting 0 values .
SLP: Optimizing 0 gather sequences instructions.
SLP: vectorized "vector"


I'm having a little trouble figuring out why the stores do not end up
being vectorized.  Does anyone have any insight into this?  Should this
pass be able to perform the desired transformation?


-------------- next part --------------
A non-text attachment was scrubbed...
Name: slp-vectorize-alloc.patch
Type: text/x-diff
Size: 1121 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131024/8accc7cc/attachment.patch>

More information about the llvm-dev mailing list