[PATCH] Add support to vectorize ctlz, cttz and powi intrinsics in SLPVectorizer

Wed May 21 01:59:01 PDT 2014

Karthik Bhat wrote:
> Hi Nadav,
> Thanks for the review. We need to use SCEV as it will detect cases were the Value* may be different but underlying value may be same.
>
> For e.g. i tried out the following example -
>    declare float @llvm.powi.f32(float, i32)
>    define void @vec_powi_f32(float* %a, float* %b, float* %c, i32 %A, i32 %B) {
>    entry:
>    %0 = alloca i32, align 4
>    %1 = alloca i32, align 4
>    %C = alloca i32, align 4
>    %D = alloca i32, align 4
>    store i32 %A, i32* %0, align 4
>    store i32 %B, i32* %1, align 4
>    %2 = load i32* %0, align 4
>    %3 = load i32* %1, align 4
>    %4 = add nsw i32 %2, %3
>    %5 = add nsw i32 %2, %3
>    store i32 %4, i32* %C, align 4
>    store i32 %5, i32* %D, align 4
>
>    %i0 = load float* %a, align 4
>    %i1 = load float* %b, align 4
>    %add1 = fadd float %i0, %i1
>    %call1 = tail call float @llvm.powi.f32(float %add1,i32 %4) nounwind readnone
>
>    %arrayidx2 = getelementptr inbounds float* %a, i32 1
>    %i2 = load float* %arrayidx2, align 4
>    %arrayidx3 = getelementptr inbounds float* %b, i32 1
>    %i3 = load float* %arrayidx3, align 4
>    %add2 = fadd float %i2, %i3
>    %call2 = tail call float @llvm.powi.f32(float %add2,i32 %5) nounwind readnone
>
>    %arrayidx4 = getelementptr inbounds float* %a, i32 2
>    %i4 = load float* %arrayidx4, align 4
>    %arrayidx5 = getelementptr inbounds float* %b, i32 2
>    %i5 = load float* %arrayidx5, align 4
>    %add3 = fadd float %i4, %i5
>    %call3 = tail call float @llvm.powi.f32(float %add3,i32 %5) nounwind readnone
>
>    %arrayidx6 = getelementptr inbounds float* %a, i32 3
>    %i6 = load float* %arrayidx6, align 4
>    %arrayidx7 = getelementptr inbounds float* %b, i32 3
>    %i7 = load float* %arrayidx7, align 4
>    %add4 = fadd float %i6, %i7
>    %call4 = tail call float @llvm.powi.f32(float %add4,i32 %4) nounwind readnone
>
>    store float %call1, float* %c, align 4
>    %arrayidx8 = getelementptr inbounds float* %c, i32 1
>    store float %call2, float* %arrayidx8, align 4
>    %arrayidx9 = getelementptr inbounds float* %c, i32 2
>    store float %call3, float* %arrayidx9, align 4
>    %arrayidx10 = getelementptr inbounds float* %c, i32 3
>    store float %call4, float* %arrayidx10, align 4
>    ret void
>    }
>
> Here %4 and %5 are referring to same value. If we just compare (Value*) for equality it will not be able to vectorize the powi in the above code. But if we use SCEV compare it is able to conclude that %4 is actually same as %5 and hence vectorizes the powi intrinsic.

Overkill? All you need is CSE. Either opt -early-cse or opt -basicaa 
-gvn should clean that up.

Through what circumstances of transforms by passes did you end up with 
this IR? Is there a place that we should have caught it earlier? Is 
there a good way to rearrange the passes such that CSE is performed 
before vectorizing?

Nick

> The same approach is used in BBVectorizer to detect if arguments are equal for these intrinsics.
>
> Thanks
>
> http://reviews.llvm.org/D3851
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>