[PATCH] Add support to vectorize ctlz, cttz and powi intrinsics in SLPVectorizer
Nick Lewycky
nicholas at mxc.ca
Wed May 21 01:59:01 PDT 2014
Karthik Bhat wrote:
> Hi Nadav,
> Thanks for the review. We need to use SCEV as it will detect cases were the Value* may be different but underlying value may be same.
>
> For e.g. i tried out the following example -
> declare float @llvm.powi.f32(float, i32)
> define void @vec_powi_f32(float* %a, float* %b, float* %c, i32 %A, i32 %B) {
> entry:
> %0 = alloca i32, align 4
> %1 = alloca i32, align 4
> %C = alloca i32, align 4
> %D = alloca i32, align 4
> store i32 %A, i32* %0, align 4
> store i32 %B, i32* %1, align 4
> %2 = load i32* %0, align 4
> %3 = load i32* %1, align 4
> %4 = add nsw i32 %2, %3
> %5 = add nsw i32 %2, %3
> store i32 %4, i32* %C, align 4
> store i32 %5, i32* %D, align 4
>
> %i0 = load float* %a, align 4
> %i1 = load float* %b, align 4
> %add1 = fadd float %i0, %i1
> %call1 = tail call float @llvm.powi.f32(float %add1,i32 %4) nounwind readnone
>
> %arrayidx2 = getelementptr inbounds float* %a, i32 1
> %i2 = load float* %arrayidx2, align 4
> %arrayidx3 = getelementptr inbounds float* %b, i32 1
> %i3 = load float* %arrayidx3, align 4
> %add2 = fadd float %i2, %i3
> %call2 = tail call float @llvm.powi.f32(float %add2,i32 %5) nounwind readnone
>
> %arrayidx4 = getelementptr inbounds float* %a, i32 2
> %i4 = load float* %arrayidx4, align 4
> %arrayidx5 = getelementptr inbounds float* %b, i32 2
> %i5 = load float* %arrayidx5, align 4
> %add3 = fadd float %i4, %i5
> %call3 = tail call float @llvm.powi.f32(float %add3,i32 %5) nounwind readnone
>
> %arrayidx6 = getelementptr inbounds float* %a, i32 3
> %i6 = load float* %arrayidx6, align 4
> %arrayidx7 = getelementptr inbounds float* %b, i32 3
> %i7 = load float* %arrayidx7, align 4
> %add4 = fadd float %i6, %i7
> %call4 = tail call float @llvm.powi.f32(float %add4,i32 %4) nounwind readnone
>
> store float %call1, float* %c, align 4
> %arrayidx8 = getelementptr inbounds float* %c, i32 1
> store float %call2, float* %arrayidx8, align 4
> %arrayidx9 = getelementptr inbounds float* %c, i32 2
> store float %call3, float* %arrayidx9, align 4
> %arrayidx10 = getelementptr inbounds float* %c, i32 3
> store float %call4, float* %arrayidx10, align 4
> ret void
> }
>
> Here %4 and %5 are referring to same value. If we just compare (Value*) for equality it will not be able to vectorize the powi in the above code. But if we use SCEV compare it is able to conclude that %4 is actually same as %5 and hence vectorizes the powi intrinsic.
Overkill? All you need is CSE. Either opt -early-cse or opt -basicaa
-gvn should clean that up.
Through what circumstances of transforms by passes did you end up with
this IR? Is there a place that we should have caught it earlier? Is
there a good way to rearrange the passes such that CSE is performed
before vectorizing?
Nick
> The same approach is used in BBVectorizer to detect if arguments are equal for these intrinsics.
>
> Thanks
>
> http://reviews.llvm.org/D3851
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
More information about the llvm-commits
mailing list