[llvm] r291670 - X86 CodeGen: Optimized pattern for truncate with unsigned saturation.

Thu Jan 19 14:36:44 PST 2017

Michael, you clearly did the right thing here.  Reverting a patch which 
is broken is absolutely appropriate and expected.

Elana, if you have an internal failure that you can reduce down to a 
reproducer for an upstream commit, please do revert the change, file a 
bug, and reply to the commit thread with a link to that pr.

Elena,

On 01/19/2017 10:47 AM, Michael Kuperstein via llvm-commits wrote:
> Hi Elena,
>
> Thanks for the fix.
>
> Regarding the revert - in this case, we're talking about:
>
> 1) A recent commit,
> 2) that has nothing else layered on top of it (except for whitespace 
> changes)
> 3) is a performance improvement that causes a correctness regression,
> 4) the crasher is reduced from real code, not a synthetic test-case,
> 5) and has a small IR reproducer.
>
> I really think that in such cases it's worth keeping trunk clean, at 
> the cost of the original commiter having to reverse-merge the revert 
> before fixing the bug.
>
> Thanks,
>   Michael
>
> On Thu, Jan 19, 2017 at 4:49 AM, Demikhovsky, Elena 
> <elena.demikhovsky at intel.com <mailto:elena.demikhovsky at intel.com>> wrote:
>
>     Fixed and recommitted in r292479.
>
>     I’d prefer that you’ll not revert the failing commit, but wait for
>     a few days. It will be easier for me to fix.
>
>     (If it is not a buildbot failure, of course. But these failures I
>     can see myself)
>
>     We also find regressions in our internal testing from time to
>     time, PR31671, for example. We submit a PR, notify the owner, and
>     let him to fix the bug.
>
>     Thanks.
>
>     -*/ Elena/*
>
>     *From:*Michael Kuperstein [mailto:mkuper at google.com
>     <mailto:mkuper at google.com>]
>     *Sent:* Thursday, January 19, 2017 01:19
>     *To:* Demikhovsky, Elena <elena.demikhovsky at intel.com
>     <mailto:elena.demikhovsky at intel.com>>
>     *Cc:* llvm-commits <llvm-commits at lists.llvm.org
>     <mailto:llvm-commits at lists.llvm.org>>
>     *Subject:* Re: [llvm] r291670 - X86 CodeGen: Optimized pattern for
>     truncate with unsigned saturation.
>
>     Hi Elena,
>
>     This still crashes in more complex cases. I've reverted in
>     r292444, see PR31589 for the reproducer.
>
>     Thanks,
>
>       Michael
>
>     On Wed, Jan 11, 2017 at 4:59 AM, Elena Demikhovsky via
>     llvm-commits <llvm-commits at lists.llvm.org
>     <mailto:llvm-commits at lists.llvm.org>> wrote:
>
>         Author: delena
>         Date: Wed Jan 11 06:59:32 2017
>         New Revision: 291670
>
>         URL: http://llvm.org/viewvc/llvm-project?rev=291670&view=rev
>         <http://llvm.org/viewvc/llvm-project?rev=291670&view=rev>
>         Log:
>         X86 CodeGen: Optimized pattern for truncate with unsigned
>         saturation.
>
>         DAG patterns optimization: truncate + unsigned saturation
>         supported by VPMOVUS* instructions in AVX-512.
>         And VPACKUS* instructions on SEE* targets.
>
>         Differential Revision: https://reviews.llvm.org/D28216
>         <https://reviews.llvm.org/D28216>
>
>
>         Modified:
>             llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>             llvm/trunk/test/CodeGen/X86/avx-trunc.ll
>             llvm/trunk/test/CodeGen/X86/avx512-trunc.ll
>
>         Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>         URL:
>         http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=291670&r1=291669&r2=291670&view=diff
>         <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=291670&r1=291669&r2=291670&view=diff>
>         ==============================================================================
>         --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
>         +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Wed Jan 11
>         06:59:32 2017
>         @@ -31220,6 +31220,93 @@ static SDValue foldVectorXorShiftIntoCmp
>            return DAG.getNode(X86ISD::PCMPGT, SDLoc(N), VT,
>         Shift.getOperand(0), Ones);
>          }
>
>         +/// Check if truncation with saturation form type \p SrcVT to
>         \p DstVT
>         +/// is valid for the given \p Subtarget.
>         +static bool isSATValidOnAVX512Subtarget(EVT SrcVT, EVT DstVT,
>         + const X86Subtarget &Subtarget) {
>         +  if (!Subtarget.hasAVX512())
>         +    return false;
>         +
>         +  // FIXME: Scalar type may be supported if we move it to
>         vector register.
>         +  if (!SrcVT.isVector() || !SrcVT.isSimple() ||
>         SrcVT.getSizeInBits() > 512)
>         +    return false;
>         +
>         +  EVT SrcElVT = SrcVT.getScalarType();
>         +  EVT DstElVT = DstVT.getScalarType();
>         +  if (SrcElVT.getSizeInBits() < 16 || SrcElVT.getSizeInBits()
>         > 64)
>         +    return false;
>         +  if (DstElVT.getSizeInBits() < 8 || DstElVT.getSizeInBits()
>         > 32)
>         +    return false;
>         +  if (SrcVT.is512BitVector() || Subtarget.hasVLX())
>         +    return SrcElVT.getSizeInBits() >= 32 || Subtarget.hasBWI();
>         +  return false;
>         +}
>         +
>         +/// Return true if VPACK* instruction can be used for the
>         given types
>         +/// and it is avalable on \p Subtarget.
>         +static bool
>         +isSATValidOnSSESubtarget(EVT SrcVT, EVT DstVT, const
>         X86Subtarget &Subtarget) {
>         +  if (Subtarget.hasSSE2())
>         +    // v16i16 -> v16i8
>         +    if (SrcVT == MVT::v16i16 && DstVT == MVT::v16i8)
>         +      return true;
>         +  if (Subtarget.hasSSE41())
>         +    // v8i32 -> v8i16
>         +    if (SrcVT == MVT::v8i32 && DstVT == MVT::v8i16)
>         +      return true;
>         +  return false;
>         +}
>         +
>         +/// Detect a pattern of truncation with saturation:
>         +/// (truncate (umin (x, unsigned_max_of_dest_type)) to
>         dest_type).
>         +/// Return the source value to be truncated or SDValue() if
>         the pattern was not
>         +/// matched.
>         +static SDValue detectUSatPattern(SDValue In, EVT VT) {
>         +  if (In.getOpcode() != ISD::UMIN)
>         +    return SDValue();
>         +
>         +  //Saturation with truncation. We truncate from InVT to VT.
>         +  assert(In.getScalarValueSizeInBits() >
>         VT.getScalarSizeInBits() &&
>         +    "Unexpected types for truncate operation");
>         +
>         +  APInt C;
>         +  if (ISD::isConstantSplatVector(In.getOperand(1).getNode(),
>         C)) {
>         +    // C should be equal to UINT32_MAX / UINT16_MAX /
>         UINT8_MAX according
>         +    // the element size of the destination type.
>         +    return APIntOps::isMask(VT.getScalarSizeInBits(), C) ?
>         In.getOperand(0) :
>         +      SDValue();
>         +  }
>         +  return SDValue();
>         +}
>         +
>         +/// Detect a pattern of truncation with saturation:
>         +/// (truncate (umin (x, unsigned_max_of_dest_type)) to
>         dest_type).
>         +/// The types should allow to use VPMOVUS* instruction on AVX512.
>         +/// Return the source value to be truncated or SDValue() if
>         the pattern was not
>         +/// matched.
>         +static SDValue detectAVX512USatPattern(SDValue In, EVT VT,
>         +  const X86Subtarget &Subtarget) {
>         +  if (!isSATValidOnAVX512Subtarget(In.getValueType(), VT,
>         Subtarget))
>         +    return SDValue();
>         +  return detectUSatPattern(In, VT);
>         +}
>         +
>         +static SDValue
>         +combineTruncateWithUSat(SDValue In, EVT VT, SDLoc &DL,
>         SelectionDAG &DAG,
>         +                        const X86Subtarget &Subtarget) {
>         +  SDValue USatVal = detectUSatPattern(In, VT);
>         +  if (USatVal) {
>         +    if (isSATValidOnAVX512Subtarget(In.getValueType(), VT,
>         Subtarget))
>         +      return DAG.getNode(X86ISD::VTRUNCUS, DL, VT, USatVal);
>         +    if (isSATValidOnSSESubtarget(In.getValueType(), VT,
>         Subtarget)) {
>         +      SDValue Lo, Hi;
>         +      std::tie(Lo, Hi) = DAG.SplitVector(USatVal, DL);
>         +      return DAG.getNode(X86ISD::PACKUS, DL, VT, Lo, Hi);
>         +    }
>         +  }
>         +  return SDValue();
>         +}
>         +
>          /// This function detects the AVG pattern between vectors of
>         unsigned i8/i16,
>          /// which is c = (a + b + 1) / 2, and replace this operation
>         with the efficient
>          /// X86ISD::AVG instruction.
>         @@ -31786,6 +31873,12 @@ static SDValue combineStore(SDNode *N, S
>          St->getPointerInfo(), St->getAlignment(),
>          St->getMemOperand()->getFlags());
>
>         +    if (SDValue Val =
>         +        detectAVX512USatPattern(St->getValue(),
>         St->getMemoryVT(), Subtarget))
>         +      return EmitTruncSStore(false /* Unsigned saturation */,
>         St->getChain(),
>         +                             dl, Val, St->getBasePtr(),
>         +  St->getMemoryVT(), St->getMemOperand(), DAG);
>         +
>              const TargetLowering &TLI = DAG.getTargetLoweringInfo();
>              unsigned NumElems = VT.getVectorNumElements();
>              assert(StVT != VT && "Cannot truncate to the same type");
>         @@ -32406,6 +32499,10 @@ static SDValue combineTruncate(SDNode *N
>            if (SDValue Avg = detectAVGPattern(Src, VT, DAG, Subtarget,
>         DL))
>              return Avg;
>
>         +  // Try to combine truncation with unsigned saturation.
>         +  if (SDValue Val = combineTruncateWithUSat(Src, VT, DL, DAG,
>         Subtarget))
>         +    return Val;
>         +
>            // The bitcast source is a direct mmx result.
>            // Detect bitcasts between i32 to x86mmx
>            if (Src.getOpcode() == ISD::BITCAST && VT == MVT::i32) {
>
>         Modified: llvm/trunk/test/CodeGen/X86/avx-trunc.ll
>         URL:
>         http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-trunc.ll?rev=291670&r1=291669&r2=291670&view=diff
>         <http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-trunc.ll?rev=291670&r1=291669&r2=291670&view=diff>
>         ==============================================================================
>         --- llvm/trunk/test/CodeGen/X86/avx-trunc.ll (original)
>         +++ llvm/trunk/test/CodeGen/X86/avx-trunc.ll Wed Jan 11
>         06:59:32 2017
>         @@ -39,3 +39,29 @@ define <16 x i8> @trunc_16_8(<16 x i16>
>            %B = trunc <16 x i16> %A to <16 x i8>
>            ret <16 x i8> %B
>          }
>         +
>         +define <16 x i8> @usat_trunc_wb_256(<16 x i16> %i) {
>         +; CHECK-LABEL: usat_trunc_wb_256:
>         +; CHECK:       # BB#0:
>         +; CHECK-NEXT:    vextractf128 $1, %ymm0, %xmm1
>         +; CHECK-NEXT:    vpackuswb %xmm1, %xmm0, %xmm0
>         +; CHECK-NEXT:    vzeroupper
>         +; CHECK-NEXT:    retq
>         +  %x3 = icmp ult <16 x i16> %i, <i16 255, i16 255, i16 255,
>         i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255,
>         i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
>         +  %x5 = select <16 x i1> %x3, <16 x i16> %i, <16 x i16> <i16
>         255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16
>         255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16
>         255, i16 255>
>         +  %x6 = trunc <16 x i16> %x5 to <16 x i8>
>         +  ret <16 x i8> %x6
>         +}
>         +
>         +define <8 x i16> @usat_trunc_dw_256(<8 x i32> %i) {
>         +; CHECK-LABEL: usat_trunc_dw_256:
>         +; CHECK:       # BB#0:
>         +; CHECK-NEXT:    vextractf128 $1, %ymm0, %xmm1
>         +; CHECK-NEXT:    vpackusdw %xmm1, %xmm0, %xmm0
>         +; CHECK-NEXT:    vzeroupper
>         +; CHECK-NEXT:    retq
>         +  %x3 = icmp ult <8 x i32> %i, <i32 65535, i32 65535, i32
>         65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
>         +  %x5 = select <8 x i1> %x3, <8 x i32> %i, <8 x i32> <i32
>         65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535,
>         i32 65535, i32 65535>
>         +  %x6 = trunc <8 x i32> %x5 to <8 x i16>
>         +  ret <8 x i16> %x6
>         +}
>
>         Modified: llvm/trunk/test/CodeGen/X86/avx512-trunc.ll
>         URL:
>         http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx512-trunc.ll?rev=291670&r1=291669&r2=291670&view=diff
>         <http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx512-trunc.ll?rev=291670&r1=291669&r2=291670&view=diff>
>         ==============================================================================
>         --- llvm/trunk/test/CodeGen/X86/avx512-trunc.ll (original)
>         +++ llvm/trunk/test/CodeGen/X86/avx512-trunc.ll Wed Jan 11
>         06:59:32 2017
>         @@ -500,3 +500,208 @@ define void @trunc_wb_128_mem(<8 x i16>
>              store <8 x i8> %x, <8 x i8>* %res
>              ret void
>          }
>         +
>         +
>         +define void @usat_trunc_wb_256_mem(<16 x i16> %i, <16 x i8>*
>         %res) {
>         +; KNL-LABEL: usat_trunc_wb_256_mem:
>         +; KNL:       ## BB#0:
>         +; KNL-NEXT:    vextracti128 $1, %ymm0, %xmm1
>         +; KNL-NEXT:    vpackuswb %xmm1, %xmm0, %xmm0
>         +; KNL-NEXT:    vmovdqu %xmm0, (%rdi)
>         +; KNL-NEXT:    retq
>         +;
>         +; SKX-LABEL: usat_trunc_wb_256_mem:
>         +; SKX:       ## BB#0:
>         +; SKX-NEXT:    vpmovuswb %ymm0, (%rdi)
>         +; SKX-NEXT:    retq
>         +  %x3 = icmp ult <16 x i16> %i, <i16 255, i16 255, i16 255,
>         i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255,
>         i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
>         +  %x5 = select <16 x i1> %x3, <16 x i16> %i, <16 x i16> <i16
>         255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16
>         255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16
>         255, i16 255>
>         +  %x6 = trunc <16 x i16> %x5 to <16 x i8>
>         +  store <16 x i8> %x6, <16 x i8>* %res, align 1
>         +  ret void
>         +}
>         +
>         +define <16 x i8> @usat_trunc_wb_256(<16 x i16> %i) {
>         +; KNL-LABEL: usat_trunc_wb_256:
>         +; KNL:       ## BB#0:
>         +; KNL-NEXT:    vextracti128 $1, %ymm0, %xmm1
>         +; KNL-NEXT:    vpackuswb %xmm1, %xmm0, %xmm0
>         +; KNL-NEXT:    retq
>         +;
>         +; SKX-LABEL: usat_trunc_wb_256:
>         +; SKX:       ## BB#0:
>         +; SKX-NEXT:    vpmovuswb %ymm0, %xmm0
>         +; SKX-NEXT:    retq
>         +  %x3 = icmp ult <16 x i16> %i, <i16 255, i16 255, i16 255,
>         i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255,
>         i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
>         +  %x5 = select <16 x i1> %x3, <16 x i16> %i, <16 x i16> <i16
>         255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16
>         255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16
>         255, i16 255>
>         +  %x6 = trunc <16 x i16> %x5 to <16 x i8>
>         +  ret <16 x i8> %x6
>         +}
>         +
>         +define void @usat_trunc_wb_128_mem(<8 x i16> %i, <8 x i8>*
>         %res) {
>         +; KNL-LABEL: usat_trunc_wb_128_mem:
>         +; KNL:       ## BB#0:
>         +; KNL-NEXT:    vpminuw {{.*}}(%rip), %xmm0, %xmm0
>         +; KNL-NEXT:    vpshufb {{.*#+}} xmm0 =
>         xmm0[0,2,4,6,8,10,12,14,u,u,u,u,u,u,u,u]
>         +; KNL-NEXT:    vmovq %xmm0, (%rdi)
>         +; KNL-NEXT:    retq
>         +;
>         +; SKX-LABEL: usat_trunc_wb_128_mem:
>         +; SKX:       ## BB#0:
>         +; SKX-NEXT:    vpmovuswb %xmm0, (%rdi)
>         +; SKX-NEXT:    retq
>         +  %x3 = icmp ult <8 x i16> %i, <i16 255, i16 255, i16 255,
>         i16 255, i16 255, i16 255, i16 255, i16 255>
>         +  %x5 = select <8 x i1> %x3, <8 x i16> %i, <8 x i16> <i16
>         255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16
>         255>
>         +  %x6 = trunc <8 x i16> %x5 to <8 x i8>
>         +  store <8 x i8> %x6, <8 x i8>* %res, align 1
>         +  ret void
>         +}
>         +
>         +define void @usat_trunc_db_512_mem(<16 x i32> %i, <16 x i8>*
>         %res) {
>         +; ALL-LABEL: usat_trunc_db_512_mem:
>         +; ALL:       ## BB#0:
>         +; ALL-NEXT:    vpmovusdb %zmm0, (%rdi)
>         +; ALL-NEXT:    retq
>         +  %x3 = icmp ult <16 x i32> %i, <i32 255, i32 255, i32 255,
>         i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>         i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>
>         +  %x5 = select <16 x i1> %x3, <16 x i32> %i, <16 x i32> <i32
>         255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
>         255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
>         255, i32 255>
>         +  %x6 = trunc <16 x i32> %x5 to <16 x i8>
>         +  store <16 x i8> %x6, <16 x i8>* %res, align 1
>         +  ret void
>         +}
>         +
>         +define void @usat_trunc_qb_512_mem(<8 x i64> %i, <8 x i8>*
>         %res) {
>         +; ALL-LABEL: usat_trunc_qb_512_mem:
>         +; ALL:       ## BB#0:
>         +; ALL-NEXT:    vpmovusqb %zmm0, (%rdi)
>         +; ALL-NEXT:    retq
>         +  %x3 = icmp ult <8 x i64> %i, <i64 255, i64 255, i64 255,
>         i64 255, i64 255, i64 255, i64 255, i64 255>
>         +  %x5 = select <8 x i1> %x3, <8 x i64> %i, <8 x i64> <i64
>         255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64
>         255>
>         +  %x6 = trunc <8 x i64> %x5 to <8 x i8>
>         +  store <8 x i8> %x6, <8 x i8>* %res, align 1
>         +  ret void
>         +}
>         +
>         +define void @usat_trunc_qd_512_mem(<8 x i64> %i, <8 x i32>*
>         %res) {
>         +; ALL-LABEL: usat_trunc_qd_512_mem:
>         +; ALL:       ## BB#0:
>         +; ALL-NEXT:    vpmovusqd %zmm0, (%rdi)
>         +; ALL-NEXT:    retq
>         +  %x3 = icmp ult <8 x i64> %i, <i64 4294967295, i64
>         4294967295, i64 4294967295, i64 4294967295, i64 4294967295,
>         i64 4294967295, i64 4294967295, i64 4294967295>
>         +  %x5 = select <8 x i1> %x3, <8 x i64> %i, <8 x i64> <i64
>         4294967295, i64 4294967295, i64 4294967295, i64 4294967295,
>         i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295>
>         +  %x6 = trunc <8 x i64> %x5 to <8 x i32>
>         +  store <8 x i32> %x6, <8 x i32>* %res, align 1
>         +  ret void
>         +}
>         +
>         +define void @usat_trunc_qw_512_mem(<8 x i64> %i, <8 x i16>*
>         %res) {
>         +; ALL-LABEL: usat_trunc_qw_512_mem:
>         +; ALL:       ## BB#0:
>         +; ALL-NEXT:    vpmovusqw %zmm0, (%rdi)
>         +; ALL-NEXT:    retq
>         +  %x3 = icmp ult <8 x i64> %i, <i64 65535, i64 65535, i64
>         65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535>
>         +  %x5 = select <8 x i1> %x3, <8 x i64> %i, <8 x i64> <i64
>         65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535,
>         i64 65535, i64 65535>
>         +  %x6 = trunc <8 x i64> %x5 to <8 x i16>
>         +  store <8 x i16> %x6, <8 x i16>* %res, align 1
>         +  ret void
>         +}
>         +
>         +define <32 x i8> @usat_trunc_db_1024(<32 x i32> %i) {
>         +; KNL-LABEL: usat_trunc_db_1024:
>         +; KNL:       ## BB#0:
>         +; KNL-NEXT:    vpmovusdb %zmm0, %xmm0
>         +; KNL-NEXT:    vpmovusdb %zmm1, %xmm1
>         +; KNL-NEXT:    vinserti128 $1, %xmm1, %ymm0, %ymm0
>         +; KNL-NEXT:    retq
>         +;
>         +; SKX-LABEL: usat_trunc_db_1024:
>         +; SKX:       ## BB#0:
>         +; SKX-NEXT:    vpbroadcastd {{.*}}(%rip), %zmm2
>         +; SKX-NEXT:    vpminud %zmm2, %zmm1, %zmm1
>         +; SKX-NEXT:    vpminud %zmm2, %zmm0, %zmm0
>         +; SKX-NEXT:    vpmovdw %zmm0, %ymm0
>         +; SKX-NEXT:    vpmovdw %zmm1, %ymm1
>         +; SKX-NEXT:    vinserti64x4 $1, %ymm1, %zmm0, %zmm0
>         +; SKX-NEXT:    vpmovwb %zmm0, %ymm0
>         +; SKX-NEXT:    retq
>         +  %x3 = icmp ult <32 x i32> %i, <i32 255, i32 255, i32 255,
>         i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>         i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>         i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>         i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>         i32 255>
>         +  %x5 = select <32 x i1> %x3, <32 x i32> %i, <32 x i32> <i32
>         255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
>         255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
>         255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
>         255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
>         255, i32 255, i32 255, i32 255>
>         +  %x6 = trunc <32 x i32> %x5 to <32 x i8>
>         +  ret <32 x i8> %x6
>         +}
>         +
>         +define void @usat_trunc_db_1024_mem(<32 x i32> %i, <32 x i8>*
>         %p) {
>         +; KNL-LABEL: usat_trunc_db_1024_mem:
>         +; KNL:       ## BB#0:
>         +; KNL-NEXT:    vpmovusdb %zmm0, %xmm0
>         +; KNL-NEXT:    vpmovusdb %zmm1, %xmm1
>         +; KNL-NEXT:    vinserti128 $1, %xmm1, %ymm0, %ymm0
>         +; KNL-NEXT:    vmovdqu %ymm0, (%rdi)
>         +; KNL-NEXT:    retq
>         +;
>         +; SKX-LABEL: usat_trunc_db_1024_mem:
>         +; SKX:       ## BB#0:
>         +; SKX-NEXT:    vpbroadcastd {{.*}}(%rip), %zmm2
>         +; SKX-NEXT:    vpminud %zmm2, %zmm1, %zmm1
>         +; SKX-NEXT:    vpminud %zmm2, %zmm0, %zmm0
>         +; SKX-NEXT:    vpmovdw %zmm0, %ymm0
>         +; SKX-NEXT:    vpmovdw %zmm1, %ymm1
>         +; SKX-NEXT:    vinserti64x4 $1, %ymm1, %zmm0, %zmm0
>         +; SKX-NEXT:    vpmovwb %zmm0, (%rdi)
>         +; SKX-NEXT:    retq
>         +  %x3 = icmp ult <32 x i32> %i, <i32 255, i32 255, i32 255,
>         i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>         i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>         i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>         i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>         i32 255>
>         +  %x5 = select <32 x i1> %x3, <32 x i32> %i, <32 x i32> <i32
>         255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
>         255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
>         255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
>         255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
>         255, i32 255, i32 255, i32 255>
>         +  %x6 = trunc <32 x i32> %x5 to <32 x i8>
>         +  store <32 x i8>%x6, <32 x i8>* %p, align 1
>         +  ret void
>         +}
>         +
>         +define <16 x i16> @usat_trunc_dw_512(<16 x i32> %i) {
>         +; ALL-LABEL: usat_trunc_dw_512:
>         +; ALL:       ## BB#0:
>         +; ALL-NEXT:    vpmovusdw %zmm0, %ymm0
>         +; ALL-NEXT:    retq
>         +  %x3 = icmp ult <16 x i32> %i, <i32 65535, i32 65535, i32
>         65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535,
>         i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32
>         65535, i32 65535, i32 65535>
>         +  %x5 = select <16 x i1> %x3, <16 x i32> %i, <16 x i32> <i32
>         65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535,
>         i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32
>         65535, i32 65535, i32 65535, i32 65535, i32 65535>
>         +  %x6 = trunc <16 x i32> %x5 to <16 x i16>
>         +  ret <16 x i16> %x6
>         +}
>         +
>         +define <8 x i8> @usat_trunc_wb_128(<8 x i16> %i) {
>         +; ALL-LABEL: usat_trunc_wb_128:
>         +; ALL:       ## BB#0:
>         +; ALL-NEXT:    vpminuw {{.*}}(%rip), %xmm0, %xmm0
>         +; ALL-NEXT:    retq
>         +  %x3 = icmp ult <8 x i16> %i, <i16 255, i16 255, i16 255,
>         i16 255, i16 255, i16 255, i16 255, i16 255>
>         +  %x5 = select <8 x i1> %x3, <8 x i16> %i, <8 x i16> <i16
>         255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16
>         255>
>         +  %x6 = trunc <8 x i16> %x5 to <8 x i8>
>         +  ret <8 x i8>%x6
>         +}
>         +
>         +define <16 x i16> @usat_trunc_qw_1024(<16 x i64> %i) {
>         +; KNL-LABEL: usat_trunc_qw_1024:
>         +; KNL:       ## BB#0:
>         +; KNL-NEXT:    vpbroadcastq {{.*}}(%rip), %zmm2
>         +; KNL-NEXT:    vpminuq %zmm2, %zmm1, %zmm1
>         +; KNL-NEXT:    vpminuq %zmm2, %zmm0, %zmm0
>         +; KNL-NEXT:    vpmovqd %zmm0, %ymm0
>         +; KNL-NEXT:    vpmovqd %zmm1, %ymm1
>         +; KNL-NEXT:    vinserti64x4 $1, %ymm1, %zmm0, %zmm0
>         +; KNL-NEXT:    vpmovdw %zmm0, %ymm0
>         +; KNL-NEXT:    retq
>         +;
>         +; SKX-LABEL: usat_trunc_qw_1024:
>         +; SKX:       ## BB#0:
>         +; SKX-NEXT:    vpbroadcastq {{.*}}(%rip), %zmm2
>         +; SKX-NEXT:    vpminuq %zmm2, %zmm1, %zmm1
>         +; SKX-NEXT:    vpminuq %zmm2, %zmm0, %zmm0
>         +; SKX-NEXT:    vpmovqd %zmm0, %ymm0
>         +; SKX-NEXT:    vpmovqd %zmm1, %ymm1
>         +; SKX-NEXT:    vinserti32x8 $1, %ymm1, %zmm0, %zmm0
>         +; SKX-NEXT:    vpmovdw %zmm0, %ymm0
>         +; SKX-NEXT:    retq
>         +  %x3 = icmp ult <16 x i64> %i, <i64 65535, i64 65535, i64
>         65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535,
>         i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64
>         65535, i64 65535, i64 65535>
>         +  %x5 = select <16 x i1> %x3, <16 x i64> %i, <16 x i64> <i64
>         65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535,
>         i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64
>         65535, i64 65535, i64 65535, i64 65535, i64 65535>
>         +  %x6 = trunc <16 x i64> %x5 to <16 x i16>
>         +  ret <16 x i16> %x6
>         +}
>         +
>
>
>         _______________________________________________
>         llvm-commits mailing list
>         llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>         http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>         <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>
>     ---------------------------------------------------------------------
>     Intel Israel (74) Limited
>
>     This e-mail and any attachments may contain confidential material for
>     the sole use of the intended recipient(s). Any review or distribution
>     by others is strictly prohibited. If you are not the intended
>     recipient, please contact the sender and delete all copies.
>
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170119/55a5787e/attachment-0001.html>