[LLVMdev] [BBVectorizer] Obvious vectorization benefit, but req-chain is too short

Tobias Grosser tobias at grosser.es
Fri Feb 3 01:28:26 PST 2012


Hi Hal,

this is one of the first test cases, I would love to have improved 
vectorizer support. I sent it out earlier, but I think it is a good time 
to look into it again, after the vectorizer was committed.

The basic examples is a set of scalar loads that load for consecutive 
elements and store them back right ahead. For me this is an obvious case 
where vectorization is beneficial (scalar.ll):

define i32 @main() nounwind {
%V1 = load float* getelementptr ([1024 x float]* @A, i64 0, i64 0), 	
	align 16
%V2 = load float* getelementptr ([1024 x float]* @A, i64 0, i64 1), 	
	align 4
%V3= load float* getelementptr ([1024 x float]* @A, i64 0, i64 2),
	align 8
%V4 = load float* getelementptr ([1024 x float]* @A, i64 0, i64 3),
	align 4
store float %V1, float* getelementptr ([1024 x float]* @B, i64 0, i64
				       0), align 16
store float %V2, float* getelementptr ([1024 x float]* @B, i64 0, i64
				       1), align 4
store float %V3, float* getelementptr ([1024 x float]* @B, i64 0, i64
                                        2), align 8
store float %V4, float* getelementptr ([1024 x float]* @B, i64 0, i64
                                        3), align 4
   ret i32 0
}

opt -O3 -vectorize can not optimize this straight ahead, as the 
req-chain is too short.

Adding -bb-vectorize-req-chain-depth=2 allows us to vectorize the code:

define i32 @main() nounwind {
   %V1 = load <4 x float>* bitcast ([1024 x float]* @A to <4 x float>*),
	align 16
   store <4 x float> %V1, <4 x float>* bitcast ([1024 x float]* @B to <4
					       x float>*), align 16
   ret i32 0
}

Is there any way, we can make this case work by default? Maybe we can 
decrease the req-chain to 2, and increase the cost for non stride one 
loads or stores?

Another probably unrelated point. I tried also a run with 
-bb-vectorize-req-chain-depth=1. The generated code is full of 
shufflevector instructions and eight element vectors. For me this is 
entirely unexpected. Do you have any ideas what is going on here?

Tobi
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: scalar.ll
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120203/b797055f/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: vector.ll
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120203/b797055f/attachment-0001.ksh>


More information about the llvm-dev mailing list