[PATCH] Add support to vectorize ctlz, cttz and powi intrinsics in SLPVectorizer
Karthik Bhat
kv.bhat at samsung.com
Wed May 21 00:36:36 PDT 2014
Hi Nadav,
Thanks for the review. We need to use SCEV as it will detect cases were the Value* may be different but underlying value may be same.
For e.g. i tried out the following example -
declare float @llvm.powi.f32(float, i32)
define void @vec_powi_f32(float* %a, float* %b, float* %c, i32 %A, i32 %B) {
entry:
%0 = alloca i32, align 4
%1 = alloca i32, align 4
%C = alloca i32, align 4
%D = alloca i32, align 4
store i32 %A, i32* %0, align 4
store i32 %B, i32* %1, align 4
%2 = load i32* %0, align 4
%3 = load i32* %1, align 4
%4 = add nsw i32 %2, %3
%5 = add nsw i32 %2, %3
store i32 %4, i32* %C, align 4
store i32 %5, i32* %D, align 4
%i0 = load float* %a, align 4
%i1 = load float* %b, align 4
%add1 = fadd float %i0, %i1
%call1 = tail call float @llvm.powi.f32(float %add1,i32 %4) nounwind readnone
%arrayidx2 = getelementptr inbounds float* %a, i32 1
%i2 = load float* %arrayidx2, align 4
%arrayidx3 = getelementptr inbounds float* %b, i32 1
%i3 = load float* %arrayidx3, align 4
%add2 = fadd float %i2, %i3
%call2 = tail call float @llvm.powi.f32(float %add2,i32 %5) nounwind readnone
%arrayidx4 = getelementptr inbounds float* %a, i32 2
%i4 = load float* %arrayidx4, align 4
%arrayidx5 = getelementptr inbounds float* %b, i32 2
%i5 = load float* %arrayidx5, align 4
%add3 = fadd float %i4, %i5
%call3 = tail call float @llvm.powi.f32(float %add3,i32 %5) nounwind readnone
%arrayidx6 = getelementptr inbounds float* %a, i32 3
%i6 = load float* %arrayidx6, align 4
%arrayidx7 = getelementptr inbounds float* %b, i32 3
%i7 = load float* %arrayidx7, align 4
%add4 = fadd float %i6, %i7
%call4 = tail call float @llvm.powi.f32(float %add4,i32 %4) nounwind readnone
store float %call1, float* %c, align 4
%arrayidx8 = getelementptr inbounds float* %c, i32 1
store float %call2, float* %arrayidx8, align 4
%arrayidx9 = getelementptr inbounds float* %c, i32 2
store float %call3, float* %arrayidx9, align 4
%arrayidx10 = getelementptr inbounds float* %c, i32 3
store float %call4, float* %arrayidx10, align 4
ret void
}
Here %4 and %5 are referring to same value. If we just compare (Value*) for equality it will not be able to vectorize the powi in the above code. But if we use SCEV compare it is able to conclude that %4 is actually same as %5 and hence vectorizes the powi intrinsic.
The same approach is used in BBVectorizer to detect if arguments are equal for these intrinsics.
Thanks
http://reviews.llvm.org/D3851
More information about the llvm-commits
mailing list