[PATCH] Add support to vectorize ctlz, cttz and powi intrinsics in SLPVectorizer

Karthik Bhat kv.bhat at samsung.com
Wed May 21 00:36:36 PDT 2014


Hi Nadav,
Thanks for the review. We need to use SCEV as it will detect cases were the Value* may be different but underlying value may be same.

For e.g. i tried out the following example -
  declare float @llvm.powi.f32(float, i32)
  define void @vec_powi_f32(float* %a, float* %b, float* %c, i32 %A, i32 %B) {
  entry:
  %0 = alloca i32, align 4
  %1 = alloca i32, align 4
  %C = alloca i32, align 4
  %D = alloca i32, align 4
  store i32 %A, i32* %0, align 4
  store i32 %B, i32* %1, align 4
  %2 = load i32* %0, align 4
  %3 = load i32* %1, align 4
  %4 = add nsw i32 %2, %3
  %5 = add nsw i32 %2, %3
  store i32 %4, i32* %C, align 4
  store i32 %5, i32* %D, align 4
  
  %i0 = load float* %a, align 4
  %i1 = load float* %b, align 4
  %add1 = fadd float %i0, %i1
  %call1 = tail call float @llvm.powi.f32(float %add1,i32 %4) nounwind readnone

  %arrayidx2 = getelementptr inbounds float* %a, i32 1
  %i2 = load float* %arrayidx2, align 4
  %arrayidx3 = getelementptr inbounds float* %b, i32 1
  %i3 = load float* %arrayidx3, align 4
  %add2 = fadd float %i2, %i3
  %call2 = tail call float @llvm.powi.f32(float %add2,i32 %5) nounwind readnone

  %arrayidx4 = getelementptr inbounds float* %a, i32 2
  %i4 = load float* %arrayidx4, align 4
  %arrayidx5 = getelementptr inbounds float* %b, i32 2
  %i5 = load float* %arrayidx5, align 4
  %add3 = fadd float %i4, %i5
  %call3 = tail call float @llvm.powi.f32(float %add3,i32 %5) nounwind readnone

  %arrayidx6 = getelementptr inbounds float* %a, i32 3
  %i6 = load float* %arrayidx6, align 4
  %arrayidx7 = getelementptr inbounds float* %b, i32 3
  %i7 = load float* %arrayidx7, align 4
  %add4 = fadd float %i6, %i7
  %call4 = tail call float @llvm.powi.f32(float %add4,i32 %4) nounwind readnone

  store float %call1, float* %c, align 4
  %arrayidx8 = getelementptr inbounds float* %c, i32 1
  store float %call2, float* %arrayidx8, align 4
  %arrayidx9 = getelementptr inbounds float* %c, i32 2
  store float %call3, float* %arrayidx9, align 4
  %arrayidx10 = getelementptr inbounds float* %c, i32 3
  store float %call4, float* %arrayidx10, align 4
  ret void
  }

Here %4 and %5 are referring to same value. If we just compare (Value*) for equality it will not be able to vectorize the powi in the above code. But if we use SCEV compare it is able to conclude that %4 is actually same as %5 and hence vectorizes the powi intrinsic.

The same approach is used in BBVectorizer to detect if arguments are equal for these intrinsics.

Thanks

http://reviews.llvm.org/D3851






More information about the llvm-commits mailing list