[llvm] r214625 - [x86] Teach the target shuffle mask extraction to recognize unary forms

Sun Aug 3 14:43:46 PDT 2014

Thanks for the report and test case! Looking into it now.
On Aug 3, 2014 5:30 AM, "Kuperstein, Michael M" <
michael.m.kuperstein at intel.com> wrote:

> Hi Chandler,
>
> This broke quite a few internal tests for me, I'm attaching a bugpoint
> reduced test-case, run it with -mcpu=corei7-avx or above.
> It looks like a pshufb is indeed getting formed, but something goes wrong
> downstream, and we end up with:
>
> LLVM ERROR: Cannot select: 0x31d95d8: i64 = ConstantPool<<16 x i8> <i8 12,
> i8 13, i8 14, i8 15, i8 12, i8 13, i8 14, i815, i8 12, i8 13, i8 14, i8 15,
> i8 12, i8 13, i8 14, i8 15>> 0 [ID=3]
> In function: sample_test
> Stack dump:
> 0.      Program arguments: llc bugpoint-reduced-simplified.ll
> -mcpu=corei7-avx -debug
> 1.      Running pass 'Function Pass Manager' on module
> 'bugpoint-reduced-simplified.ll'.
> 2.      Running pass 'X86 DAG->DAG Instruction Selection' on function
> '@sample_test'
>
> Unfortunately, I have no idea how constant pool lowering is supposed to
> work, so can't say anything useful about how or why that happens.
>
> Thanks,
>    Michael
>
> -----Original Message-----
> From: llvm-commits-bounces at cs.uiuc.edu [mailto:
> llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Chandler Carruth
> Sent: Saturday, August 02, 2014 13:28
> To: llvm-commits at cs.uiuc.edu
> Subject: [llvm] r214625 - [x86] Teach the target shuffle mask extraction
> to recognize unary forms
>
> Author: chandlerc
> Date: Sat Aug  2 05:27:38 2014
> New Revision: 214625
>
> URL: http://llvm.org/viewvc/llvm-project?rev=214625&view=rev
> Log:
> [x86] Teach the target shuffle mask extraction to recognize unary forms of
> normally binary shuffle instructions like PUNPCKL and MOVLHPS.
>
> This detects cases where a single register is used for both operands
> making the shuffle behave in a unary way. We detect this and adjust the
> mask to use the unary form which allows the existing DAG combine for
> shuffle instructions to actually work at all.
>
> As a consequence, this uncovered a number of obvious bugs in the existing
> DAG combine which are fixed. It also now canonicalizes several shuffles
> even with the existing lowering. These typically are trying to match the
> shuffle to the domain of the input where before we only really modeled them
> with the floating point variants. All of the cases which change to an
> integer shuffle here have something in the integer domain, so there are no
> more or fewer domain crosses here AFAICT. Technically, it might be better
> to go from a GPR directly to the floating point domain, but detecting
> floating point *outputs* despite integer inputs is a lot more code and
> seems unlikely to be worthwhile in practice. If folks are seeing
> domain-crossing regressions here though, let me know and I can hack
> something up to fix it.
>
> Also as a consequence, a bunch of missed opportunities to form pshufb now
> can be formed. Notably, splats of i8s now form pshufb.
> Interestingly, this improves the existing splat lowering too. We go from
> 3 instructions to 1. Yes, we may tie up a register, but it seems very
> likely to be worth it, especially if splatting the 0th byte (the common
> case) as then we can use a zeroed register as the mask.
>
> Modified:
>     llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>     llvm/trunk/test/CodeGen/X86/avx-basic.ll
>     llvm/trunk/test/CodeGen/X86/avx-splat.ll
>     llvm/trunk/test/CodeGen/X86/exedepsfix-broadcast.ll
>     llvm/trunk/test/CodeGen/X86/vec_splat-3.ll
>
> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=214625&r1=214624&r2=214625&view=diff
>
> ==============================================================================
> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Sat Aug  2 05:27:38
> +++ 2014
> @@ -5133,30 +5133,38 @@ static SDValue getShuffleVectorZeroOrUnd  }
>
>  /// getTargetShuffleMask - Calculates the shuffle mask corresponding to
> the -/// target specific opcode. Returns true if the Mask could be
> calculated.
> -/// Sets IsUnary to true if only uses one source.
> +/// target specific opcode. Returns true if the Mask could be
> +calculated. Sets /// IsUnary to true if only uses one source. Note that
> +this will set IsUnary for /// shuffles which use a single input
> +multiple times, and in those cases it will /// adjust the mask to only
> have indices within that single input.
>  static bool getTargetShuffleMask(SDNode *N, MVT VT,
>                                   SmallVectorImpl<int> &Mask, bool
> &IsUnary) {
>    unsigned NumElems = VT.getVectorNumElements();
>    SDValue ImmN;
>
>    IsUnary = false;
> +  bool IsFakeUnary = false;
>    switch(N->getOpcode()) {
>    case X86ISD::SHUFP:
>      ImmN = N->getOperand(N->getNumOperands()-1);
>      DecodeSHUFPMask(VT, cast<ConstantSDNode>(ImmN)->getZExtValue(), Mask);
> +    IsUnary = IsFakeUnary = N->getOperand(0) == N->getOperand(1);
>      break;
>    case X86ISD::UNPCKH:
>      DecodeUNPCKHMask(VT, Mask);
> +    IsUnary = IsFakeUnary = N->getOperand(0) == N->getOperand(1);
>      break;
>    case X86ISD::UNPCKL:
>      DecodeUNPCKLMask(VT, Mask);
> +    IsUnary = IsFakeUnary = N->getOperand(0) == N->getOperand(1);
>      break;
>    case X86ISD::MOVHLPS:
>      DecodeMOVHLPSMask(NumElems, Mask);
> +    IsUnary = IsFakeUnary = N->getOperand(0) == N->getOperand(1);
>      break;
>    case X86ISD::MOVLHPS:
>      DecodeMOVLHPSMask(NumElems, Mask);
> +    IsUnary = IsFakeUnary = N->getOperand(0) == N->getOperand(1);
>      break;
>    case X86ISD::PALIGNR:
>      ImmN = N->getOperand(N->getNumOperands()-1);
> @@ -5210,6 +5218,14 @@ static bool getTargetShuffleMask(SDNode
>    default: llvm_unreachable("unknown target shuffle node");
>    }
>
> +  // If we have a fake unary shuffle, the shuffle mask is spread across
> + two  // inputs that are actually the same node. Re-map the mask to
> + always point  // into the first input.
> +  if (IsFakeUnary)
> +    for (int &M : Mask)
> +      if (M >= (int)Mask.size())
> +        M -= Mask.size();
> +
>    return true;
>  }
>
> @@ -18735,6 +18751,8 @@ static bool combineX86ShuffleChain(SDVal
>        bool Lo = Mask.equals(0, 0);
>        unsigned Shuffle = FloatDomain ? (Lo ? X86ISD::MOVLHPS :
> X86ISD::MOVHLPS)
>                                       : (Lo ? X86ISD::UNPCKL :
> X86ISD::UNPCKH);
> +      if (Depth == 1 && Root->getOpcode() == Shuffle)
> +        return false; // Nothing to do!
>        MVT ShuffleVT = FloatDomain ? MVT::v4f32 : MVT::v2i64;
>        Op = DAG.getNode(ISD::BITCAST, DL, ShuffleVT, Input);
>        DCI.AddToWorklist(Op.getNode());
> @@ -18757,16 +18775,18 @@ static bool combineX86ShuffleChain(SDVal
>           Mask.equals(8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14,
> 15,
>                       15))) {
>        bool Lo = Mask[0] == 0;
> +      unsigned Shuffle = Lo ? X86ISD::UNPCKL : X86ISD::UNPCKH;
> +      if (Depth == 1 && Root->getOpcode() == Shuffle)
> +        return false; // Nothing to do!
>        MVT ShuffleVT;
>        switch (Mask.size()) {
>        case 4: ShuffleVT = MVT::v4i32; break;
> -      case 8: ShuffleVT = MVT::v8i32; break;
> -      case 16: ShuffleVT = MVT::v16i32; break;
> +      case 8: ShuffleVT = MVT::v8i16; break;
> +      case 16: ShuffleVT = MVT::v16i8; break;
>        };
>        Op = DAG.getNode(ISD::BITCAST, DL, ShuffleVT, Input);
>        DCI.AddToWorklist(Op.getNode());
> -      Op = DAG.getNode(Lo ? X86ISD::UNPCKL : X86ISD::UNPCKH, DL,
> ShuffleVT, Op,
> -                       Op);
> +      Op = DAG.getNode(Shuffle, DL, ShuffleVT, Op, Op);
>        DCI.AddToWorklist(Op.getNode());
>        DCI.CombineTo(Root.getNode(), DAG.getNode(ISD::BITCAST, DL, RootVT,
> Op),
>                      /*AddTo*/ true);
>
> Modified: llvm/trunk/test/CodeGen/X86/avx-basic.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-basic.ll?rev=214625&r1=214624&r2=214625&view=diff
>
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/avx-basic.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/avx-basic.ll Sat Aug  2 05:27:38 2014
> @@ -72,9 +72,9 @@ entry:
>    ret <4 x i64> %shuffle
>  }
>
> -; CHECK: movlhps
> +; CHECK: vpunpcklqdq
>  ; CHECK-NEXT: vextractf128  $1
> -; CHECK-NEXT: movlhps
> +; CHECK-NEXT: vpunpcklqdq
>  ; CHECK-NEXT: vinsertf128 $1
>  define <4 x i64> @C(<4 x i64> %a, <4 x i64> %b) nounwind uwtable readnone
> ssp {
>  entry:
>
> Modified: llvm/trunk/test/CodeGen/X86/avx-splat.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-splat.ll?rev=214625&r1=214624&r2=214625&view=diff
>
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/avx-splat.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/avx-splat.ll Sat Aug  2 05:27:38 2014
> @@ -1,9 +1,7 @@
>  ; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=corei7-avx -mattr=+avx
> | FileCheck %s
>
>
> -; CHECK: vpunpcklbw %xmm
> -; CHECK-NEXT: vpunpckhbw %xmm
> -; CHECK-NEXT: vpshufd $85
> +; CHECK: vpshufb {{.*}} ## xmm0 = xmm0[5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5]
>  ; CHECK-NEXT: vinsertf128 $1
>  define <32 x i8> @funcA(<32 x i8> %a) nounwind uwtable readnone ssp {
>  entry:
> @@ -21,7 +19,7 @@ entry:
>  }
>
>  ; CHECK: vmovq
> -; CHECK-NEXT: vmovlhps %xmm
> +; CHECK-NEXT: vpunpcklqdq %xmm
>  ; CHECK-NEXT: vinsertf128 $1
>  define <4 x i64> @funcC(i64 %q) nounwind uwtable readnone ssp {
>  entry:
>
> Modified: llvm/trunk/test/CodeGen/X86/exedepsfix-broadcast.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/exedepsfix-broadcast.ll?rev=214625&r1=214624&r2=214625&view=diff
>
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/exedepsfix-broadcast.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/exedepsfix-broadcast.ll Sat Aug  2
> +++ 05:27:38 2014
> @@ -93,10 +93,10 @@ define <4 x double> @ExeDepsFix_broadcas
>
>
>  ; CHECK-LABEL: ExeDepsFix_broadcastsd_inreg -; ExeDepsFix works top down,
> thus it coalesces vmovlhps domain with -; vandps and there is nothing more
> you can do to match vmaxpd.
> -; CHECK: vmovlhps
> -; CHECK: vandps
> +; ExeDepsFix works top down, thus it coalesces vpunpcklqdq domain with
> +; vpand and there is nothing more you can do to match vmaxpd.
> +; CHECK: vpunpcklqdq
> +; CHECK: vpand
>  ; CHECK: vmaxpd
>  ; CHECK: ret
>  define <2 x double> @ExeDepsFix_broadcastsd_inreg(<2 x double> %arg, <2 x
> double> %arg2, i64 %broadcastvalue) {
>
> Modified: llvm/trunk/test/CodeGen/X86/vec_splat-3.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vec_splat-3.ll?rev=214625&r1=214624&r2=214625&view=diff
>
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/vec_splat-3.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/vec_splat-3.ll Sat Aug  2 05:27:38 2014
> @@ -75,9 +75,8 @@ define <16 x i8> @shuf_16i8_8(<16 x i8>
>         ret <16 x i8> %tmp6
>
>  ; CHECK-LABEL: shuf_16i8_8:
> -; CHECK: punpcklbw
> -; CHECK-NEXT: punpcklbw
> -; CHECK-NEXT: pshufd $0
> +; CHECK: pxor %[[X:xmm[0-9]+]], %[[X]]
> +; CHECK-NEXT: pshufb %[[X]], %xmm0
>  }
>
>  define <16 x i8> @shuf_16i8_9(<16 x i8> %T0, <16 x i8> %T1) nounwind
> readnone { @@ -85,9 +84,7 @@ define <16 x i8> @shuf_16i8_9(<16 x i8>
>         ret <16 x i8> %tmp6
>
>  ; CHECK-LABEL: shuf_16i8_9:
> -; CHECK: punpcklbw
> -; CHECK-NEXT: punpcklbw
> -; CHECK-NEXT: pshufd $85
> +; CHECK: pshufb {{.*}} # xmm0 = xmm0[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]
>  }
>
>  define <16 x i8> @shuf_16i8_10(<16 x i8> %T0, <16 x i8> %T1) nounwind
> readnone { @@ -95,9 +92,7 @@ define <16 x i8> @shuf_16i8_10(<16 x i8>
>         ret <16 x i8> %tmp6
>
>  ; CHECK-LABEL: shuf_16i8_10:
> -; CHECK: punpcklbw
> -; CHECK-NEXT: punpcklbw
> -; CHECK-NEXT: pshufd $-86
> +; CHECK: pshufb {{.*}} # xmm0 = xmm0[2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2]
>  }
>
>  define <16 x i8> @shuf_16i8_11(<16 x i8> %T0, <16 x i8> %T1) nounwind
> readnone { @@ -105,9 +100,7 @@ define <16 x i8> @shuf_16i8_11(<16 x i8>
>         ret <16 x i8> %tmp6
>
>  ; CHECK-LABEL: shuf_16i8_11:
> -; CHECK: punpcklbw
> -; CHECK-NEXT: punpcklbw
> -; CHECK-NEXT: pshufd $-1
> +; CHECK: pshufb {{.*}} # xmm0 = xmm0[3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3]
>  }
>
>
> @@ -124,9 +117,7 @@ define <16 x i8> @shuf_16i8_13(<16 x i8>
>         ret <16 x i8> %tmp6
>
>  ; CHECK-LABEL: shuf_16i8_13:
> -; CHECK: punpcklbw
> -; CHECK-NEXT: punpckhbw
> -; CHECK-NEXT: pshufd $85
> +; CHECK: pshufb {{.*}} # xmm0 = xmm0[5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5]
>  }
>
>  define <16 x i8> @shuf_16i8_14(<16 x i8> %T0, <16 x i8> %T1) nounwind
> readnone { @@ -134,9 +125,7 @@ define <16 x i8> @shuf_16i8_14(<16 x i8>
>         ret <16 x i8> %tmp6
>
>  ; CHECK-LABEL: shuf_16i8_14:
> -; CHECK: punpcklbw
> -; CHECK-NEXT: punpckhbw
> -; CHECK-NEXT: pshufd $-86
> +; CHECK: pshufb {{.*}} # xmm0 = xmm0[6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6]
>  }
>
>  define <16 x i8> @shuf_16i8_15(<16 x i8> %T0, <16 x i8> %T1) nounwind
> readnone { @@ -144,9 +133,7 @@ define <16 x i8> @shuf_16i8_15(<16 x i8>
>         ret <16 x i8> %tmp6
>
>  ; CHECK-LABEL: shuf_16i8_15:
> -; CHECK: punpcklbw
> -; CHECK-NEXT: punpckhbw
> -; CHECK-NEXT: pshufd $-1
> +; CHECK: pshufb {{.*}} # xmm0 = xmm0[7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7]
>  }
>
>  define <16 x i8> @shuf_16i8_16(<16 x i8> %T0, <16 x i8> %T1) nounwind
> readnone { @@ -154,9 +141,7 @@ define <16 x i8> @shuf_16i8_16(<16 x i8>
>         ret <16 x i8> %tmp6
>
>  ; CHECK-LABEL: shuf_16i8_16:
> -; CHECK: punpckhbw
> -; CHECK-NEXT: punpcklbw
> -; CHECK-NEXT: pshufd $0
> +; CHECK: pshufb {{.*}} # xmm0 = xmm0[8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8]
>  }
>
>  define <16 x i8> @shuf_16i8_17(<16 x i8> %T0, <16 x i8> %T1) nounwind
> readnone { @@ -164,9 +149,7 @@ define <16 x i8> @shuf_16i8_17(<16 x i8>
>         ret <16 x i8> %tmp6
>
>  ; CHECK-LABEL: shuf_16i8_17:
> -; CHECK: punpckhbw
> -; CHECK-NEXT: punpcklbw
> -; CHECK-NEXT: pshufd $85
> +; CHECK: pshufb {{.*}} # xmm0 = xmm0[9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9]
>  }
>
>  define <16 x i8> @shuf_16i8_18(<16 x i8> %T0, <16 x i8> %T1) nounwind
> readnone { @@ -174,9 +157,7 @@ define <16 x i8> @shuf_16i8_18(<16 x i8>
>         ret <16 x i8> %tmp6
>
>  ; CHECK-LABEL: shuf_16i8_18:
> -; CHECK: punpckhbw
> -; CHECK-NEXT: punpcklbw
> -; CHECK-NEXT: pshufd $-86
> +; CHECK: pshufb {{.*}} # xmm0 =
> +xmm0[10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10]
>  }
>
>  define <16 x i8> @shuf_16i8_19(<16 x i8> %T0, <16 x i8> %T1) nounwind
> readnone { @@ -184,9 +165,7 @@ define <16 x i8> @shuf_16i8_19(<16 x i8>
>         ret <16 x i8> %tmp6
>
>  ; CHECK-LABEL: shuf_16i8_19:
> -; CHECK: punpckhbw
> -; CHECK-NEXT: punpcklbw
> -; CHECK-NEXT: pshufd $-1
> +; CHECK: pshufb {{.*}} # xmm0 =
> +xmm0[11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11]
>  }
>
>  define <16 x i8> @shuf_16i8_20(<16 x i8> %T0, <16 x i8> %T1) nounwind
> readnone { @@ -194,9 +173,7 @@ define <16 x i8> @shuf_16i8_20(<16 x i8>
>         ret <16 x i8> %tmp6
>
>  ; CHECK-LABEL: shuf_16i8_20:
> -; CHECK: punpckhbw
> -; CHECK-NEXT: punpckhbw
> -; CHECK-NEXT: pshufd $0
> +; CHECK: pshufb {{.*}} # xmm0 =
> +xmm0[12,12,12,12,12,12,12,12,12,12,12,12,12,12,12,12]
>  }
>
>  define <16 x i8> @shuf_16i8_21(<16 x i8> %T0, <16 x i8> %T1) nounwind
> readnone { @@ -204,9 +181,7 @@ define <16 x i8> @shuf_16i8_21(<16 x i8>
>         ret <16 x i8> %tmp6
>
>  ; CHECK-LABEL: shuf_16i8_21:
> -; CHECK: punpckhbw
> -; CHECK-NEXT: punpckhbw
> -; CHECK-NEXT: pshufd $85
> +; CHECK: pshufb {{.*}} # xmm0 =
> +xmm0[13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13]
>  }
>
>  define <16 x i8> @shuf_16i8_22(<16 x i8> %T0, <16 x i8> %T1) nounwind
> readnone { @@ -214,9 +189,7 @@ define <16 x i8> @shuf_16i8_22(<16 x i8>
>         ret <16 x i8> %tmp6
>
>  ; CHECK-LABEL: shuf_16i8_22:
> -; CHECK: punpckhbw
> -; CHECK-NEXT: punpckhbw
> -; CHECK-NEXT: pshufd $-86
> +; CHECK: pshufb {{.*}} # xmm0 =
> +xmm0[14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14]
>  }
>
>  define <16 x i8> @shuf_16i8_23(<16 x i8> %T0, <16 x i8> %T1) nounwind
> readnone { @@ -224,7 +197,5 @@ define <16 x i8> @shuf_16i8_23(<16 x i8>
>         ret <16 x i8> %tmp6
>
>  ; CHECK-LABEL: shuf_16i8_23:
> -; CHECK: punpckhbw
> -; CHECK-NEXT: punpckhbw
> -; CHECK-NEXT: pshufd $-1
> +; CHECK: pshufb {{.*}} # xmm0 =
> +xmm0[15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15]
>  }
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140803/4ecc72cb/attachment.html>