[llvm] r180875 - SROA: Generate selects instead of shuffles when blending values because this is the cannonical form.

Chandler Carruth chandlerc at google.com
Wed May 1 17:42:34 PDT 2013


On Wed, May 1, 2013 at 9:53 PM, Nadav Rotem <nrotem at apple.com> wrote:

> Author: nadav
> Date: Wed May  1 14:53:30 2013
> New Revision: 180875
>
> URL: http://llvm.org/viewvc/llvm-project?rev=180875&view=rev
> Log:
> SROA: Generate selects instead of shuffles when blending values because
> this is the cannonical form.
> Shuffles are more difficult to lower and we usually don't touch them,
> while we do optimize selects more often.
>

Why? (not that i disagree, I just don't think you've explained it fully...)

Also, why doesn't instcombine canonicalize shuffles to selects if they are
the canonical form? The whole way I picked a form was to look at what would
get canonicalized if the user wrote it, and match that. Until the rest of
the pipeline is canonicalizing in this way, I worry greatly about this
being a deeply fragile hack.

Also, as select has *numerous* other problems, it would seem really
important to teach the backend to optimize shuffles that are blends rather
than paper over this deficiency in SROA.

More detailed comments on the patch in line...


>
>
> Modified:
>     llvm/trunk/lib/Transforms/Scalar/SROA.cpp
>     llvm/trunk/test/Transforms/SROA/vector-promotion.ll
>
> Modified: llvm/trunk/lib/Transforms/Scalar/SROA.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/SROA.cpp?rev=180875&r1=180874&r2=180875&view=diff
>
> ==============================================================================
> --- llvm/trunk/lib/Transforms/Scalar/SROA.cpp (original)
> +++ llvm/trunk/lib/Transforms/Scalar/SROA.cpp Wed May  1 14:53:30 2013
> @@ -2322,17 +2322,15 @@ static Value *insertVector(IRBuilderTy &
>    V = IRB.CreateShuffleVector(V, UndefValue::get(V->getType()),
>                                ConstantVector::get(Mask),
>                                Name + ".expand");
> -  DEBUG(dbgs() << "    shuffle1: " << *V << "\n");
> +  DEBUG(dbgs() << "    shuffle: " << *V << "\n");
>
>    Mask.clear();
>    for (unsigned i = 0; i != VecTy->getNumElements(); ++i)
> -    if (i >= BeginIndex && i < EndIndex)
> -      Mask.push_back(IRB.getInt32(i));
> -    else
> -      Mask.push_back(IRB.getInt32(i + VecTy->getNumElements()));
> -  V = IRB.CreateShuffleVector(V, Old, ConstantVector::get(Mask),
> -                              Name + "insert");
> -  DEBUG(dbgs() << "    shuffle2: " << *V << "\n");
> +    Mask.push_back(IRB.getInt1(i >= BeginIndex && i < EndIndex));


As this is no longer a shuffle mask, it might be appropriate to use a
different variable (and name). At the very least its confusing as this now
*must* be a vector of booleans where it previously was not.

+
> +  V = IRB.CreateSelect(ConstantVector::get(Mask), V, Old, Name + "blend");
> +
> +  DEBUG(dbgs() << "    blend: " << *V << "\n");
>

Why not call this a select? that's what it is.


>    return V;
>  }
>
>
> Modified: llvm/trunk/test/Transforms/SROA/vector-promotion.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SROA/vector-promotion.ll?rev=180875&r1=180874&r2=180875&view=diff
>
> ==============================================================================
> --- llvm/trunk/test/Transforms/SROA/vector-promotion.ll (original)
> +++ llvm/trunk/test/Transforms/SROA/vector-promotion.ll Wed May  1
> 14:53:30 2013
> @@ -224,26 +224,26 @@ entry:
>    %a.cast0 = bitcast i32* %a.gep0 to <2 x i32>*
>    store <2 x i32> <i32 0, i32 0>, <2 x i32>* %a.cast0
>  ; CHECK-NOT: store
> -; CHECK:      %[[insert1:.*]] = shufflevector <4 x i32> <i32 0, i32 0,
> i32 undef, i32 undef>, <4 x i32> undef, <4 x i32> <i32 0, i32 1, {{.*}}>
> +; CHECK:     select <4 x i1> <i1 true, i1 true, i1 false, i1 false>
>

Here and below you have significantly decreased the specificity of the
check. Can you please preserve that? I really dislike checking instructions
occur but not checking that the values they produce are used in the correct
context. While sometimes it is impossible, here the code *already did
that*. =/


>
>    %a.gep1 = getelementptr <4 x i32>* %a, i32 0, i32 1
>    %a.cast1 = bitcast i32* %a.gep1 to <2 x i32>*
>    store <2 x i32> <i32 1, i32 1>, <2 x i32>* %a.cast1
> -; CHECK-NEXT: %[[insert2:.*]] = shufflevector <4 x i32> <i32 undef, i32
> 1, i32 1, i32 undef>, <4 x i32> %[[insert1]], <4 x i32> <i32 4, i32 1, i32
> 2, {{.*}}>
> +; CHECK-NEXT: select <4 x i1> <i1 false, i1 true, i1 true, i1 false>
>
>    %a.gep2 = getelementptr <4 x i32>* %a, i32 0, i32 2
>    %a.cast2 = bitcast i32* %a.gep2 to <2 x i32>*
>    store <2 x i32> <i32 2, i32 2>, <2 x i32>* %a.cast2
> -; CHECK-NEXT: %[[insert3:.*]] = shufflevector <4 x i32> <i32 undef, i32
> undef, i32 2, i32 2>, <4 x i32> %[[insert2]], <4 x i32> <i32 4, i32 5, i32
> 2, i32 3>
> +; CHECK-NEXT: select <4 x i1> <i1 false, i1 false, i1 true, i1 true>
>
>    %a.gep3 = getelementptr <4 x i32>* %a, i32 0, i32 3
>    store i32 3, i32* %a.gep3
> -; CHECK-NEXT: %[[insert4:.*]] = insertelement <4 x i32> %[[insert3]], i32
> 3, i32 3
> +; CHECK-NEXT: insertelement <4 x i32>
>
>    %ret = load <4 x i32>* %a
>
>    ret <4 x i32> %ret
> -; CHECK-NEXT: ret <4 x i32> %[[insert4]]
> +; CHECK-NEXT: ret <4 x i32>
>  }
>
>  define <4 x i32> @test_subvec_load() {
> @@ -291,27 +291,27 @@ entry:
>    %a.cast0 = bitcast float* %a.gep0 to i8*
>    call void @llvm.memset.p0i8.i32(i8* %a.cast0, i8 0, i32 8, i32 0, i1
> false)
>  ; CHECK-NOT: store
> -; CHECK:      %[[insert1:.*]] = shufflevector <4 x float> <float
> 0.000000e+00, float 0.000000e+00, float undef, float undef>, <4 x float>
> undef, <4 x i32> <i32 0, i32 1, {{.*}}>
> +; CHECK: select <4 x i1> <i1 true, i1 true, i1 false, i1 false>
>
>    %a.gep1 = getelementptr <4 x float>* %a, i32 0, i32 1
>    %a.cast1 = bitcast float* %a.gep1 to i8*
>    call void @llvm.memset.p0i8.i32(i8* %a.cast1, i8 1, i32 8, i32 0, i1
> false)
> -; CHECK-NEXT: %[[insert2:.*]] = shufflevector <4 x float> <float undef,
> float 0x3820202020000000, float 0x3820202020000000, float undef>, <4 x
> float> %[[insert1]], <4 x i32> <i32 4, i32 1, i32 2, {{.*}}>
> +; CHECK-NEXT: select <4 x i1> <i1 false, i1 true, i1 true, i1 false>
>
>    %a.gep2 = getelementptr <4 x float>* %a, i32 0, i32 2
>    %a.cast2 = bitcast float* %a.gep2 to i8*
>    call void @llvm.memset.p0i8.i32(i8* %a.cast2, i8 3, i32 8, i32 0, i1
> false)
> -; CHECK-NEXT: %[[insert3:.*]] = shufflevector <4 x float> <float undef,
> float undef, float 0x3860606060000000, float 0x3860606060000000>, <4 x
> float> %[[insert2]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
> +; CHECK-NEXT: select <4 x i1> <i1 false, i1 false, i1 true, i1 true>
>
>    %a.gep3 = getelementptr <4 x float>* %a, i32 0, i32 3
>    %a.cast3 = bitcast float* %a.gep3 to i8*
>    call void @llvm.memset.p0i8.i32(i8* %a.cast3, i8 7, i32 4, i32 0, i1
> false)
> -; CHECK-NEXT: %[[insert4:.*]] = insertelement <4 x float> %[[insert3]],
> float 0x38E0E0E0E0000000, i32 3
> +; CHECK-NEXT: insertelement <4 x float>
>
>    %ret = load <4 x float>* %a
>
>    ret <4 x float> %ret
> -; CHECK-NEXT: ret <4 x float> %[[insert4]]
> +; CHECK-NEXT: ret <4 x float>
>  }
>
>  define <4 x float> @test_subvec_memcpy(i8* %x, i8* %y, i8* %z, i8* %f,
> i8* %out) {
> @@ -326,7 +326,7 @@ entry:
>  ; CHECK:      %[[xptr:.*]] = bitcast i8* %x to <2 x float>*
>  ; CHECK-NEXT: %[[x:.*]] = load <2 x float>* %[[xptr]]
>  ; CHECK-NEXT: %[[expand_x:.*]] = shufflevector <2 x float> %[[x]], <2 x
> float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
> -; CHECK-NEXT: %[[insert_x:.*]] = shufflevector <4 x float> %[[expand_x]],
> <4 x float> undef, <4 x i32> <i32 0, i32 1, {{.*}}>
> +; CHECK-NEXT: select <4 x i1> <i1 true, i1 true, i1 false, i1 false>
>
>    %a.gep1 = getelementptr <4 x float>* %a, i32 0, i32 1
>    %a.cast1 = bitcast float* %a.gep1 to i8*
> @@ -334,7 +334,7 @@ entry:
>  ; CHECK-NEXT: %[[yptr:.*]] = bitcast i8* %y to <2 x float>*
>  ; CHECK-NEXT: %[[y:.*]] = load <2 x float>* %[[yptr]]
>  ; CHECK-NEXT: %[[expand_y:.*]] = shufflevector <2 x float> %[[y]], <2 x
> float> undef, <4 x i32> <i32 undef, i32 0, i32 1, i32 undef>
> -; CHECK-NEXT: %[[insert_y:.*]] = shufflevector <4 x float> %[[expand_y]],
> <4 x float> %[[insert_x]], <4 x i32> <i32 4, i32 1, i32 2, {{.*}}>
> +; CHECK-NEXT: select <4 x i1> <i1 false, i1 true, i1 true, i1 false>
>
>    %a.gep2 = getelementptr <4 x float>* %a, i32 0, i32 2
>    %a.cast2 = bitcast float* %a.gep2 to i8*
> @@ -342,14 +342,14 @@ entry:
>  ; CHECK-NEXT: %[[zptr:.*]] = bitcast i8* %z to <2 x float>*
>  ; CHECK-NEXT: %[[z:.*]] = load <2 x float>* %[[zptr]]
>  ; CHECK-NEXT: %[[expand_z:.*]] = shufflevector <2 x float> %[[z]], <2 x
> float> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>
> -; CHECK-NEXT: %[[insert_z:.*]] = shufflevector <4 x float> %[[expand_z]],
> <4 x float> %[[insert_y]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
> +; CHECK-NEXT: select <4 x i1> <i1 false, i1 false, i1 true, i1 true>
>
>    %a.gep3 = getelementptr <4 x float>* %a, i32 0, i32 3
>    %a.cast3 = bitcast float* %a.gep3 to i8*
>    call void @llvm.memcpy.p0i8.p0i8.i32(i8* %a.cast3, i8* %f, i32 4, i32
> 0, i1 false)
>  ; CHECK-NEXT: %[[fptr:.*]] = bitcast i8* %f to float*
>  ; CHECK-NEXT: %[[f:.*]] = load float* %[[fptr]]
> -; CHECK-NEXT: %[[insert_f:.*]] = insertelement <4 x float> %[[insert_z]],
> float %[[f]], i32 3
> +; CHECK-NEXT: %[[insert_f:.*]] = insertelement <4 x float>
>
>    call void @llvm.memcpy.p0i8.p0i8.i32(i8* %out, i8* %a.cast2, i32 8, i32
> 0, i1 false)
>  ; CHECK-NEXT: %[[outptr:.*]] = bitcast i8* %out to <2 x float>*
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130502/47319453/attachment.html>


More information about the llvm-commits mailing list