[llvm] r180802 - InstCombine: Fold more shuffles of shuffles.

Tue Apr 30 14:15:54 PDT 2013

Doing them in backend would be better as there're too many target
details such as SSE variants.

- Michael

On Tue, 2013-04-30 at 14:09 -0700, Jim Grosbach wrote:
> Suggestions for alternatives, then? As it was before, the code was
> massively pessimizing ARM codegen. Unless we’re willing to add target
> information here (see FIXME), IMO the code here should if anything be
> even more aggressive and it’s the target’s responsibility to split
> things back up.
> 
> 
> I did check X86 codegen for a variety of test cases and always got
> better results. Entirely possible, likely even, there’s cases I missed
> of course.
> 
> 
> -Jim
> 
> 
> 
> 
> On Apr 30, 2013, at 2:00 PM, Nadav Rotem <nrotem at apple.com> wrote:
> 
> > Hi Jim, 
> > 
> > 
> > AVX does not allow cross-lane shuffles within a single vector.  Your
> > change can potentially introduce regressions for hand-written code
> > that assumes the current LLVM behavior. 
> > 
> > 
> > Thanks,
> > NAdav
> > 
> > On Apr 30, 2013, at 1:43 PM, Jim Grosbach <grosbach at apple.com>
> > wrote:
> > 
> > > Author: grosbach
> > > Date: Tue Apr 30 15:43:52 2013
> > > New Revision: 180802
> > > 
> > > URL: http://llvm.org/viewvc/llvm-project?rev=180802&view=rev
> > > Log:
> > > InstCombine: Fold more shuffles of shuffles.
> > > 
> > > Always fold a shuffle-of-shuffle into a single shuffle when
> > > there's only one
> > > input vector in the first place. Continue to be more conservative
> > > when there's
> > > multiple inputs.
> > > 
> > > rdar://13402653
> > > PR15866
> > > 
> > > Modified:
> > >    llvm/trunk/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
> > >    llvm/trunk/test/Transforms/BBVectorize/simple.ll
> > >    llvm/trunk/test/Transforms/InstCombine/vec_shuffle.ll
> > > 
> > > Modified:
> > > llvm/trunk/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
> > > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/InstCombine/InstCombineVectorOps.cpp?rev=180802&r1=180801&r2=180802&view=diff
> > > ==============================================================================
> > > --- llvm/trunk/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
> > > (original)
> > > +++ llvm/trunk/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
> > > Tue Apr 30 15:43:52 2013
> > > @@ -614,11 +614,16 @@ Instruction *InstCombiner::visitShuffleV
> > >   // we are absolutely afraid of producing a shuffle mask not in
> > > the input
> > >   // program, because the code gen may not be smart enough to turn
> > > a merged
> > >   // shuffle into two specific shuffles: it may produce worse
> > > code.  As such,
> > > -  // we only merge two shuffles if the result is either a splat
> > > or one of the
> > > -  // input shuffle masks.  In this case, merging the shuffles
> > > just removes
> > > -  // one instruction, which we know is safe.  This is good for
> > > things like
> > > +  // we only merge two shuffles if the result is a splat, one of
> > > the input
> > > +  // input shuffle masks, or if there's only one input to the
> > > shuffle.
> > > +  // In this case, merging the shuffles just removes one
> > > instruction, which
> > > +  // we know is safe.  This is good for things like
> > >   // turning: (splat(splat)) -> splat, or
> > >   // merge(V[0..n], V[n+1..2n]) -> V[0..2n]
> > > +  //
> > > +  // FIXME: This is almost certainly far, far too conservative.
> > > We should
> > > +  // have a better model. Perhaps a TargetTransformInfo hook to
> > > ask whether
> > > +  // a shuffle is considered OK?
> > >   ShuffleVectorInst* LHSShuffle =
> > > dyn_cast<ShuffleVectorInst>(LHS);
> > >   ShuffleVectorInst* RHSShuffle =
> > > dyn_cast<ShuffleVectorInst>(RHS);
> > >   if (LHSShuffle)
> > > @@ -743,8 +748,10 @@ Instruction *InstCombiner::visitShuffleV
> > >   }
> > > 
> > >   // If the result mask is equal to one of the original shuffle
> > > masks,
> > > -  // or is a splat, do the replacement.
> > > -  if (isSplat || newMask == LHSMask || newMask == RHSMask ||
> > > newMask == Mask) {
> > > +  // or is a splat, do the replacement. Similarly, if there is
> > > only one
> > > +  // input vector, go ahead and do the folding.
> > > +  if (isSplat || newMask == LHSMask || newMask == RHSMask ||
> > > newMask == Mask ||
> > > +      isa<UndefValue>(RHS)) {
> > >     SmallVector<Constant*, 16> Elts;
> > >     Type *Int32Ty = Type::getInt32Ty(SVI.getContext());
> > >     for (unsigned i = 0, e = newMask.size(); i != e; ++i) {
> > > 
> > > Modified: llvm/trunk/test/Transforms/BBVectorize/simple.ll
> > > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/BBVectorize/simple.ll?rev=180802&r1=180801&r2=180802&view=diff
> > > ==============================================================================
> > > --- llvm/trunk/test/Transforms/BBVectorize/simple.ll (original)
> > > +++ llvm/trunk/test/Transforms/BBVectorize/simple.ll Tue Apr 30
> > > 15:43:52 2013
> > > @@ -139,11 +139,10 @@ define <8 x i8> @test6(<8 x i8> %A1, <8
> > > ; CHECK: %Z1 = add <16 x i8> %Y1, %X1.v.i1
> > >         %Q1 = shufflevector <8 x i8> %Z1, <8 x i8> %Z2, <8 x i32>
> > > <i32 15, i32 8, i32 6, i32 1, i32 13, i32 10, i32 4, i32 3>
> > >         %Q2 = shufflevector <8 x i8> %Z2, <8 x i8> %Z2, <8 x i32>
> > > <i32 6, i32 7, i32 0, i32 1, i32 2, i32 4, i32 4, i32 1>
> > > -; CHECK: %Q1.v.i1 = shufflevector <16 x i8> %Z1, <16 x i8> undef,
> > > <16 x i32> <i32 8, i32 undef, i32 10, i32 undef, i32 undef, i32
> > > 13, i32 undef, i32 15, i32 undef, i32 undef, i32 undef, i32 undef,
> > > i32 undef, i32 undef, i32 undef, i32 undef>
> > > -; CHECK: %Q1 = shufflevector <16 x i8> %Z1, <16 x i8> %Q1.v.i1,
> > > <16 x i32> <i32 23, i32 16, i32 6, i32 1, i32 21, i32 18, i32 4,
> > > i32 3, i32 14, i32 15, i32 8, i32 9, i32 10, i32 12, i32 12, i32
> > > 9>
> > > - %R  = mul <8 x i8> %Q1, %Q2
> > > -; CHECK: %Q1.v.r1 = shufflevector <16 x i8> %Q1, <16 x i8> undef,
> > > <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
> > > -; CHECK: %Q1.v.r2 = shufflevector <16 x i8> %Q1, <16 x i8> undef,
> > > <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14,
> > > i32 15>
> > > +        %R  = mul <8 x i8> %Q1, %Q2
> > > +; CHECK:  %Q1.v.i1 = shufflevector <16 x i8> %Z1, <16 x i8>
> > > undef, <16 x i32> <i32 8, i32 undef, i32 10, i32 undef, i32 undef,
> > > i32 13, i32 undef, i32 15, i32 undef, i32 undef, i32 undef, i32
> > > undef, i32 undef, i32 undef, i32 undef, i32 undef>
> > > +; CHECK:  %Q1.v.r1 = shufflevector <16 x i8> %Z1, <16 x i8> %
> > > Q1.v.i1, <8 x i32> <i32 23, i32 16, i32 6, i32 1, i32 21, i32 18,
> > > i32 4, i32 3>
> > > +; CHECK:  %Q1.v.r2 = shufflevector <16 x i8> %Z1, <16 x i8>
> > > undef, <8 x i32> <i32 14, i32 15, i32 8, i32 9, i32 10, i32 12,
> > > i32 12, i32 9>
> > > ; CHECK: %R = mul <8 x i8> %Q1.v.r1, %Q1.v.r2
> > > ret <8 x i8> %R
> > > ; CHECK: ret <8 x i8> %R
> > > 
> > > Modified: llvm/trunk/test/Transforms/InstCombine/vec_shuffle.ll
> > > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/InstCombine/vec_shuffle.ll?rev=180802&r1=180801&r2=180802&view=diff
> > > ==============================================================================
> > > --- llvm/trunk/test/Transforms/InstCombine/vec_shuffle.ll
> > > (original)
> > > +++ llvm/trunk/test/Transforms/InstCombine/vec_shuffle.ll Tue Apr
> > > 30 15:43:52 2013
> > > @@ -86,14 +86,14 @@ define <4 x i8> @test9(<16 x i8> %tmp6)
> > > }
> > > 
> > > ; Same as test9, but make sure that "undef" mask values are not
> > > confused with
> > > -; mask values of 2*N, where N is the mask length.  These shuffles
> > > should not
> > > -; be folded (because [8,9,4,8] may not be a mask supported by the
> > > target).
> > > -define <4 x i8> @test9a(<16 x i8> %tmp6) nounwind {
> > > +; mask values of 2*N, where N is the mask length of the result.
> > >  Make sure when
> > > +; folding these shuffles that 'undef' mask values stay that way
> > > in the result
> > > +; instead of getting mapped to the 2*N'th entry of the source.
> > > +define <4 x i8> @test9a(<16 x i8> %in, <16 x i8> %in2) nounwind {
> > > ; CHECK: @test9a
> > > -; CHECK-NEXT: shufflevector
> > > -; CHECK-NEXT: shufflevector
> > > +; CHECK-NEXT: shufflevector <16 x i8> %in, <16 x i8> %in2, <4 x
> > > i32> <i32 16, i32 9, i32 4, i32 undef>
> > > ; CHECK-NEXT: ret
> > > - %tmp7 = shufflevector <16 x i8> %tmp6, <16 x i8> undef, <4 x
> > > i32> < i32 undef, i32 9, i32 4, i32 8 > ; <<4 x i8>> [#uses=1]
> > > + %tmp7 = shufflevector <16 x i8> %in, <16 x i8> %in2, <4 x i32> <
> > > i32 undef, i32 9, i32 4, i32 16 > ; <<4 x i8>> [#uses=1]
> > > %tmp9 = shufflevector <4 x i8> %tmp7, <4 x i8> undef, <4 x i32> <
> > > i32 3, i32 1, i32 2, i32 0 > ; <<4 x i8>> [#uses=1]
> > > ret <4 x i8> %tmp9
> > > }
> > > 
> > > 
> > > _______________________________________________
> > > llvm-commits mailing list
> > > llvm-commits at cs.uiuc.edu
> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits