[LLVMdev] [BBVectorizer] Obvious vectorization benefit, but req-chain is too short
Hal Finkel
hfinkel at anl.gov
Fri Feb 3 20:21:49 PST 2012
On Fri, 2012-02-03 at 10:28 +0100, Tobias Grosser wrote:
> Hi Hal,
>
> this is one of the first test cases, I would love to have improved
> vectorizer support. I sent it out earlier, but I think it is a good time
> to look into it again, after the vectorizer was committed.
>
> The basic examples is a set of scalar loads that load for consecutive
> elements and store them back right ahead. For me this is an obvious case
> where vectorization is beneficial (scalar.ll):
>
> define i32 @main() nounwind {
> %V1 = load float* getelementptr ([1024 x float]* @A, i64 0, i64 0),
> align 16
> %V2 = load float* getelementptr ([1024 x float]* @A, i64 0, i64 1),
> align 4
> %V3= load float* getelementptr ([1024 x float]* @A, i64 0, i64 2),
> align 8
> %V4 = load float* getelementptr ([1024 x float]* @A, i64 0, i64 3),
> align 4
> store float %V1, float* getelementptr ([1024 x float]* @B, i64 0, i64
> 0), align 16
> store float %V2, float* getelementptr ([1024 x float]* @B, i64 0, i64
> 1), align 4
> store float %V3, float* getelementptr ([1024 x float]* @B, i64 0, i64
> 2), align 8
> store float %V4, float* getelementptr ([1024 x float]* @B, i64 0, i64
> 3), align 4
> ret i32 0
> }
>
> opt -O3 -vectorize can not optimize this straight ahead, as the
> req-chain is too short.
>
> Adding -bb-vectorize-req-chain-depth=2 allows us to vectorize the code:
>
> define i32 @main() nounwind {
> %V1 = load <4 x float>* bitcast ([1024 x float]* @A to <4 x float>*),
> align 16
> store <4 x float> %V1, <4 x float>* bitcast ([1024 x float]* @B to <4
> x float>*), align 16
> ret i32 0
> }
>
> Is there any way, we can make this case work by default? Maybe we can
> decrease the req-chain to 2, and increase the cost for non stride one
> loads or stores?
Try it now (after r149761). If this "solution" causes other problems,
then we may need to think of something more sophisticated.
-Hal
>
> Another probably unrelated point. I tried also a run with
> -bb-vectorize-req-chain-depth=1. The generated code is full of
> shufflevector instructions and eight element vectors. For me this is
> entirely unexpected. Do you have any ideas what is going on here?
>
> Tobi
--
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-dev
mailing list