[llvm] r291670 - X86 CodeGen: Optimized pattern for truncate with unsigned saturation.

Hans Wennborg via llvm-commits llvm-commits at lists.llvm.org
Wed Jan 25 09:08:59 PST 2017


It seems safer to merge the revert. I went ahead and merged in r293070.

Thanks,
Hans

On Tue, Jan 24, 2017 at 1:23 PM, Michael Kuperstein <mkuper at google.com> wrote:
> The branch name is release_40, AFAIK.
> As to revert vs. fix - I defer to Hans / Craig.
>
> Michael
>
> On Tue, Jan 24, 2017 at 11:29 AM, Demikhovsky, Elena
> <elena.demikhovsky at intel.com> wrote:
>>
>> I propose to fix, it will be easier than revert. But it’s not so
>> important, Hans’s decision.
>>
>> I’ll need to know the branch name in order to make changes.
>>
>>
>>
>> -           Elena
>>
>>
>>
>> From: Michael Kuperstein [mailto:mkuper at google.com]
>> Sent: Tuesday, January 24, 2017 21:09
>> To: Demikhovsky, Elena <elena.demikhovsky at intel.com>; Hans Wennborg
>> <hans at chromium.org>
>>
>>
>> Cc: llvm-commits <llvm-commits at lists.llvm.org>
>> Subject: Re: [llvm] r291670 - X86 CodeGen: Optimized pattern for truncate
>> with unsigned saturation.
>>
>>
>>
>> I just noticed the original commit is before the 4.0 branch-point, but my
>> revert was after.
>>
>> Do we want to revert r291670 in 4.0? Or merge the fixed version (r292479)?
>>
>>
>>
>> Hans, Elena?
>>
>>
>>
>> On Thu, Jan 19, 2017 at 10:47 AM, Michael Kuperstein <mkuper at google.com>
>> wrote:
>>
>> Hi Elena,
>>
>>
>>
>> Thanks for the fix.
>>
>>
>>
>> Regarding the revert - in this case, we're talking about:
>>
>>
>>
>> 1) A recent commit,
>>
>> 2) that has nothing else layered on top of it (except for whitespace
>> changes)
>>
>> 3) is a performance improvement that causes a correctness regression,
>>
>> 4) the crasher is reduced from real code, not a synthetic test-case,
>>
>> 5) and has a small IR reproducer.
>>
>>
>>
>> I really think that in such cases it's worth keeping trunk clean, at the
>> cost of the original commiter having to reverse-merge the revert before
>> fixing the bug.
>>
>>
>>
>> Thanks,
>>
>>   Michael
>>
>>
>>
>> On Thu, Jan 19, 2017 at 4:49 AM, Demikhovsky, Elena
>> <elena.demikhovsky at intel.com> wrote:
>>
>> Fixed and recommitted in r292479.
>>
>>
>>
>> I’d prefer that you’ll not revert the failing commit, but wait for a few
>> days. It will be easier for me to fix.
>>
>> (If it is not a buildbot failure, of course. But these failures I can see
>> myself)
>>
>> We also find regressions in our internal testing from time to time,
>> PR31671, for example. We submit a PR, notify the owner, and let him to fix
>> the bug.
>>
>>
>>
>> Thanks.
>>
>>
>>
>> -           Elena
>>
>>
>>
>> From: Michael Kuperstein [mailto:mkuper at google.com]
>> Sent: Thursday, January 19, 2017 01:19
>> To: Demikhovsky, Elena <elena.demikhovsky at intel.com>
>> Cc: llvm-commits <llvm-commits at lists.llvm.org>
>> Subject: Re: [llvm] r291670 - X86 CodeGen: Optimized pattern for truncate
>> with unsigned saturation.
>>
>>
>>
>> Hi Elena,
>>
>>
>>
>> This still crashes in more complex cases. I've reverted in r292444, see
>> PR31589 for the reproducer.
>>
>>
>>
>> Thanks,
>>
>>   Michael
>>
>>
>>
>> On Wed, Jan 11, 2017 at 4:59 AM, Elena Demikhovsky via llvm-commits
>> <llvm-commits at lists.llvm.org> wrote:
>>
>> Author: delena
>> Date: Wed Jan 11 06:59:32 2017
>> New Revision: 291670
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=291670&view=rev
>> Log:
>> X86 CodeGen: Optimized pattern for truncate with unsigned saturation.
>>
>> DAG patterns optimization: truncate + unsigned saturation supported by
>> VPMOVUS* instructions in AVX-512.
>> And VPACKUS* instructions on SEE* targets.
>>
>> Differential Revision: https://reviews.llvm.org/D28216
>>
>>
>> Modified:
>>     llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>>     llvm/trunk/test/CodeGen/X86/avx-trunc.ll
>>     llvm/trunk/test/CodeGen/X86/avx512-trunc.ll
>>
>> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=291670&r1=291669&r2=291670&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
>> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Wed Jan 11 06:59:32 2017
>> @@ -31220,6 +31220,93 @@ static SDValue foldVectorXorShiftIntoCmp
>>    return DAG.getNode(X86ISD::PCMPGT, SDLoc(N), VT, Shift.getOperand(0),
>> Ones);
>>  }
>>
>> +/// Check if truncation with saturation form type \p SrcVT to \p DstVT
>> +/// is valid for the given \p Subtarget.
>> +static bool isSATValidOnAVX512Subtarget(EVT SrcVT, EVT DstVT,
>> +                                        const X86Subtarget &Subtarget) {
>> +  if (!Subtarget.hasAVX512())
>> +    return false;
>> +
>> +  // FIXME: Scalar type may be supported if we move it to vector
>> register.
>> +  if (!SrcVT.isVector() || !SrcVT.isSimple() || SrcVT.getSizeInBits() >
>> 512)
>> +    return false;
>> +
>> +  EVT SrcElVT = SrcVT.getScalarType();
>> +  EVT DstElVT = DstVT.getScalarType();
>> +  if (SrcElVT.getSizeInBits() < 16 || SrcElVT.getSizeInBits() > 64)
>> +    return false;
>> +  if (DstElVT.getSizeInBits() < 8 || DstElVT.getSizeInBits() > 32)
>> +    return false;
>> +  if (SrcVT.is512BitVector() || Subtarget.hasVLX())
>> +    return SrcElVT.getSizeInBits() >= 32 || Subtarget.hasBWI();
>> +  return false;
>> +}
>> +
>> +/// Return true if VPACK* instruction can be used for the given types
>> +/// and it is avalable on \p Subtarget.
>> +static bool
>> +isSATValidOnSSESubtarget(EVT SrcVT, EVT DstVT, const X86Subtarget
>> &Subtarget) {
>> +  if (Subtarget.hasSSE2())
>> +    // v16i16 -> v16i8
>> +    if (SrcVT == MVT::v16i16 && DstVT == MVT::v16i8)
>> +      return true;
>> +  if (Subtarget.hasSSE41())
>> +    // v8i32 -> v8i16
>> +    if (SrcVT == MVT::v8i32 && DstVT == MVT::v8i16)
>> +      return true;
>> +  return false;
>> +}
>> +
>> +/// Detect a pattern of truncation with saturation:
>> +/// (truncate (umin (x, unsigned_max_of_dest_type)) to dest_type).
>> +/// Return the source value to be truncated or SDValue() if the pattern
>> was not
>> +/// matched.
>> +static SDValue detectUSatPattern(SDValue In, EVT VT) {
>> +  if (In.getOpcode() != ISD::UMIN)
>> +    return SDValue();
>> +
>> +  //Saturation with truncation. We truncate from InVT to VT.
>> +  assert(In.getScalarValueSizeInBits() > VT.getScalarSizeInBits() &&
>> +    "Unexpected types for truncate operation");
>> +
>> +  APInt C;
>> +  if (ISD::isConstantSplatVector(In.getOperand(1).getNode(), C)) {
>> +    // C should be equal to UINT32_MAX / UINT16_MAX / UINT8_MAX according
>> +    // the element size of the destination type.
>> +    return APIntOps::isMask(VT.getScalarSizeInBits(), C) ?
>> In.getOperand(0) :
>> +      SDValue();
>> +  }
>> +  return SDValue();
>> +}
>> +
>> +/// Detect a pattern of truncation with saturation:
>> +/// (truncate (umin (x, unsigned_max_of_dest_type)) to dest_type).
>> +/// The types should allow to use VPMOVUS* instruction on AVX512.
>> +/// Return the source value to be truncated or SDValue() if the pattern
>> was not
>> +/// matched.
>> +static SDValue detectAVX512USatPattern(SDValue In, EVT VT,
>> +                                       const X86Subtarget &Subtarget) {
>> +  if (!isSATValidOnAVX512Subtarget(In.getValueType(), VT, Subtarget))
>> +    return SDValue();
>> +  return detectUSatPattern(In, VT);
>> +}
>> +
>> +static SDValue
>> +combineTruncateWithUSat(SDValue In, EVT VT, SDLoc &DL, SelectionDAG &DAG,
>> +                        const X86Subtarget &Subtarget) {
>> +  SDValue USatVal = detectUSatPattern(In, VT);
>> +  if (USatVal) {
>> +    if (isSATValidOnAVX512Subtarget(In.getValueType(), VT, Subtarget))
>> +      return DAG.getNode(X86ISD::VTRUNCUS, DL, VT, USatVal);
>> +    if (isSATValidOnSSESubtarget(In.getValueType(), VT, Subtarget)) {
>> +      SDValue Lo, Hi;
>> +      std::tie(Lo, Hi) = DAG.SplitVector(USatVal, DL);
>> +      return DAG.getNode(X86ISD::PACKUS, DL, VT, Lo, Hi);
>> +    }
>> +  }
>> +  return SDValue();
>> +}
>> +
>>  /// This function detects the AVG pattern between vectors of unsigned
>> i8/i16,
>>  /// which is c = (a + b + 1) / 2, and replace this operation with the
>> efficient
>>  /// X86ISD::AVG instruction.
>> @@ -31786,6 +31873,12 @@ static SDValue combineStore(SDNode *N, S
>>                            St->getPointerInfo(), St->getAlignment(),
>>                            St->getMemOperand()->getFlags());
>>
>> +    if (SDValue Val =
>> +        detectAVX512USatPattern(St->getValue(), St->getMemoryVT(),
>> Subtarget))
>> +      return EmitTruncSStore(false /* Unsigned saturation */,
>> St->getChain(),
>> +                             dl, Val, St->getBasePtr(),
>> +                             St->getMemoryVT(), St->getMemOperand(),
>> DAG);
>> +
>>      const TargetLowering &TLI = DAG.getTargetLoweringInfo();
>>      unsigned NumElems = VT.getVectorNumElements();
>>      assert(StVT != VT && "Cannot truncate to the same type");
>> @@ -32406,6 +32499,10 @@ static SDValue combineTruncate(SDNode *N
>>    if (SDValue Avg = detectAVGPattern(Src, VT, DAG, Subtarget, DL))
>>      return Avg;
>>
>> +  // Try to combine truncation with unsigned saturation.
>> +  if (SDValue Val = combineTruncateWithUSat(Src, VT, DL, DAG, Subtarget))
>> +    return Val;
>> +
>>    // The bitcast source is a direct mmx result.
>>    // Detect bitcasts between i32 to x86mmx
>>    if (Src.getOpcode() == ISD::BITCAST && VT == MVT::i32) {
>>
>> Modified: llvm/trunk/test/CodeGen/X86/avx-trunc.ll
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-trunc.ll?rev=291670&r1=291669&r2=291670&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/X86/avx-trunc.ll (original)
>> +++ llvm/trunk/test/CodeGen/X86/avx-trunc.ll Wed Jan 11 06:59:32 2017
>> @@ -39,3 +39,29 @@ define <16 x i8> @trunc_16_8(<16 x i16>
>>    %B = trunc <16 x i16> %A to <16 x i8>
>>    ret <16 x i8> %B
>>  }
>> +
>> +define <16 x i8> @usat_trunc_wb_256(<16 x i16> %i) {
>> +; CHECK-LABEL: usat_trunc_wb_256:
>> +; CHECK:       # BB#0:
>> +; CHECK-NEXT:    vextractf128 $1, %ymm0, %xmm1
>> +; CHECK-NEXT:    vpackuswb %xmm1, %xmm0, %xmm0
>> +; CHECK-NEXT:    vzeroupper
>> +; CHECK-NEXT:    retq
>> +  %x3 = icmp ult <16 x i16> %i, <i16 255, i16 255, i16 255, i16 255, i16
>> 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255,
>> i16 255, i16 255, i16 255>
>> +  %x5 = select <16 x i1> %x3, <16 x i16> %i, <16 x i16> <i16 255, i16
>> 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255,
>> i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
>> +  %x6 = trunc <16 x i16> %x5 to <16 x i8>
>> +  ret <16 x i8> %x6
>> +}
>> +
>> +define <8 x i16> @usat_trunc_dw_256(<8 x i32> %i) {
>> +; CHECK-LABEL: usat_trunc_dw_256:
>> +; CHECK:       # BB#0:
>> +; CHECK-NEXT:    vextractf128 $1, %ymm0, %xmm1
>> +; CHECK-NEXT:    vpackusdw %xmm1, %xmm0, %xmm0
>> +; CHECK-NEXT:    vzeroupper
>> +; CHECK-NEXT:    retq
>> +  %x3 = icmp ult <8 x i32> %i, <i32 65535, i32 65535, i32 65535, i32
>> 65535, i32 65535, i32 65535, i32 65535, i32 65535>
>> +  %x5 = select <8 x i1> %x3, <8 x i32> %i, <8 x i32> <i32 65535, i32
>> 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
>> +  %x6 = trunc <8 x i32> %x5 to <8 x i16>
>> +  ret <8 x i16> %x6
>> +}
>>
>> Modified: llvm/trunk/test/CodeGen/X86/avx512-trunc.ll
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx512-trunc.ll?rev=291670&r1=291669&r2=291670&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/X86/avx512-trunc.ll (original)
>> +++ llvm/trunk/test/CodeGen/X86/avx512-trunc.ll Wed Jan 11 06:59:32 2017
>> @@ -500,3 +500,208 @@ define void @trunc_wb_128_mem(<8 x i16>
>>      store <8 x i8> %x, <8 x i8>* %res
>>      ret void
>>  }
>> +
>> +
>> +define void @usat_trunc_wb_256_mem(<16 x i16> %i, <16 x i8>* %res) {
>> +; KNL-LABEL: usat_trunc_wb_256_mem:
>> +; KNL:       ## BB#0:
>> +; KNL-NEXT:    vextracti128 $1, %ymm0, %xmm1
>> +; KNL-NEXT:    vpackuswb %xmm1, %xmm0, %xmm0
>> +; KNL-NEXT:    vmovdqu %xmm0, (%rdi)
>> +; KNL-NEXT:    retq
>> +;
>> +; SKX-LABEL: usat_trunc_wb_256_mem:
>> +; SKX:       ## BB#0:
>> +; SKX-NEXT:    vpmovuswb %ymm0, (%rdi)
>> +; SKX-NEXT:    retq
>> +  %x3 = icmp ult <16 x i16> %i, <i16 255, i16 255, i16 255, i16 255, i16
>> 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255,
>> i16 255, i16 255, i16 255>
>> +  %x5 = select <16 x i1> %x3, <16 x i16> %i, <16 x i16> <i16 255, i16
>> 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255,
>> i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
>> +  %x6 = trunc <16 x i16> %x5 to <16 x i8>
>> +  store <16 x i8> %x6, <16 x i8>* %res, align 1
>> +  ret void
>> +}
>> +
>> +define <16 x i8> @usat_trunc_wb_256(<16 x i16> %i) {
>> +; KNL-LABEL: usat_trunc_wb_256:
>> +; KNL:       ## BB#0:
>> +; KNL-NEXT:    vextracti128 $1, %ymm0, %xmm1
>> +; KNL-NEXT:    vpackuswb %xmm1, %xmm0, %xmm0
>> +; KNL-NEXT:    retq
>> +;
>> +; SKX-LABEL: usat_trunc_wb_256:
>> +; SKX:       ## BB#0:
>> +; SKX-NEXT:    vpmovuswb %ymm0, %xmm0
>> +; SKX-NEXT:    retq
>> +  %x3 = icmp ult <16 x i16> %i, <i16 255, i16 255, i16 255, i16 255, i16
>> 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255,
>> i16 255, i16 255, i16 255>
>> +  %x5 = select <16 x i1> %x3, <16 x i16> %i, <16 x i16> <i16 255, i16
>> 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255,
>> i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
>> +  %x6 = trunc <16 x i16> %x5 to <16 x i8>
>> +  ret <16 x i8> %x6
>> +}
>> +
>> +define void @usat_trunc_wb_128_mem(<8 x i16> %i, <8 x i8>* %res) {
>> +; KNL-LABEL: usat_trunc_wb_128_mem:
>> +; KNL:       ## BB#0:
>> +; KNL-NEXT:    vpminuw {{.*}}(%rip), %xmm0, %xmm0
>> +; KNL-NEXT:    vpshufb {{.*#+}} xmm0 =
>> xmm0[0,2,4,6,8,10,12,14,u,u,u,u,u,u,u,u]
>> +; KNL-NEXT:    vmovq %xmm0, (%rdi)
>> +; KNL-NEXT:    retq
>> +;
>> +; SKX-LABEL: usat_trunc_wb_128_mem:
>> +; SKX:       ## BB#0:
>> +; SKX-NEXT:    vpmovuswb %xmm0, (%rdi)
>> +; SKX-NEXT:    retq
>> +  %x3 = icmp ult <8 x i16> %i, <i16 255, i16 255, i16 255, i16 255, i16
>> 255, i16 255, i16 255, i16 255>
>> +  %x5 = select <8 x i1> %x3, <8 x i16> %i, <8 x i16> <i16 255, i16 255,
>> i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
>> +  %x6 = trunc <8 x i16> %x5 to <8 x i8>
>> +  store <8 x i8> %x6, <8 x i8>* %res, align 1
>> +  ret void
>> +}
>> +
>> +define void @usat_trunc_db_512_mem(<16 x i32> %i, <16 x i8>* %res) {
>> +; ALL-LABEL: usat_trunc_db_512_mem:
>> +; ALL:       ## BB#0:
>> +; ALL-NEXT:    vpmovusdb %zmm0, (%rdi)
>> +; ALL-NEXT:    retq
>> +  %x3 = icmp ult <16 x i32> %i, <i32 255, i32 255, i32 255, i32 255, i32
>> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>> i32 255, i32 255, i32 255>
>> +  %x5 = select <16 x i1> %x3, <16 x i32> %i, <16 x i32> <i32 255, i32
>> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>> i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>
>> +  %x6 = trunc <16 x i32> %x5 to <16 x i8>
>> +  store <16 x i8> %x6, <16 x i8>* %res, align 1
>> +  ret void
>> +}
>> +
>> +define void @usat_trunc_qb_512_mem(<8 x i64> %i, <8 x i8>* %res) {
>> +; ALL-LABEL: usat_trunc_qb_512_mem:
>> +; ALL:       ## BB#0:
>> +; ALL-NEXT:    vpmovusqb %zmm0, (%rdi)
>> +; ALL-NEXT:    retq
>> +  %x3 = icmp ult <8 x i64> %i, <i64 255, i64 255, i64 255, i64 255, i64
>> 255, i64 255, i64 255, i64 255>
>> +  %x5 = select <8 x i1> %x3, <8 x i64> %i, <8 x i64> <i64 255, i64 255,
>> i64 255, i64 255, i64 255, i64 255, i64 255, i64 255>
>> +  %x6 = trunc <8 x i64> %x5 to <8 x i8>
>> +  store <8 x i8> %x6, <8 x i8>* %res, align 1
>> +  ret void
>> +}
>> +
>> +define void @usat_trunc_qd_512_mem(<8 x i64> %i, <8 x i32>* %res) {
>> +; ALL-LABEL: usat_trunc_qd_512_mem:
>> +; ALL:       ## BB#0:
>> +; ALL-NEXT:    vpmovusqd %zmm0, (%rdi)
>> +; ALL-NEXT:    retq
>> +  %x3 = icmp ult <8 x i64> %i, <i64 4294967295, i64 4294967295, i64
>> 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295,
>> i64 4294967295>
>> +  %x5 = select <8 x i1> %x3, <8 x i64> %i, <8 x i64> <i64 4294967295, i64
>> 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295,
>> i64 4294967295, i64 4294967295>
>> +  %x6 = trunc <8 x i64> %x5 to <8 x i32>
>> +  store <8 x i32> %x6, <8 x i32>* %res, align 1
>> +  ret void
>> +}
>> +
>> +define void @usat_trunc_qw_512_mem(<8 x i64> %i, <8 x i16>* %res) {
>> +; ALL-LABEL: usat_trunc_qw_512_mem:
>> +; ALL:       ## BB#0:
>> +; ALL-NEXT:    vpmovusqw %zmm0, (%rdi)
>> +; ALL-NEXT:    retq
>> +  %x3 = icmp ult <8 x i64> %i, <i64 65535, i64 65535, i64 65535, i64
>> 65535, i64 65535, i64 65535, i64 65535, i64 65535>
>> +  %x5 = select <8 x i1> %x3, <8 x i64> %i, <8 x i64> <i64 65535, i64
>> 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535>
>> +  %x6 = trunc <8 x i64> %x5 to <8 x i16>
>> +  store <8 x i16> %x6, <8 x i16>* %res, align 1
>> +  ret void
>> +}
>> +
>> +define <32 x i8> @usat_trunc_db_1024(<32 x i32> %i) {
>> +; KNL-LABEL: usat_trunc_db_1024:
>> +; KNL:       ## BB#0:
>> +; KNL-NEXT:    vpmovusdb %zmm0, %xmm0
>> +; KNL-NEXT:    vpmovusdb %zmm1, %xmm1
>> +; KNL-NEXT:    vinserti128 $1, %xmm1, %ymm0, %ymm0
>> +; KNL-NEXT:    retq
>> +;
>> +; SKX-LABEL: usat_trunc_db_1024:
>> +; SKX:       ## BB#0:
>> +; SKX-NEXT:    vpbroadcastd {{.*}}(%rip), %zmm2
>> +; SKX-NEXT:    vpminud %zmm2, %zmm1, %zmm1
>> +; SKX-NEXT:    vpminud %zmm2, %zmm0, %zmm0
>> +; SKX-NEXT:    vpmovdw %zmm0, %ymm0
>> +; SKX-NEXT:    vpmovdw %zmm1, %ymm1
>> +; SKX-NEXT:    vinserti64x4 $1, %ymm1, %zmm0, %zmm0
>> +; SKX-NEXT:    vpmovwb %zmm0, %ymm0
>> +; SKX-NEXT:    retq
>> +  %x3 = icmp ult <32 x i32> %i, <i32 255, i32 255, i32 255, i32 255, i32
>> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>> i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
>> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>> i32 255, i32 255>
>> +  %x5 = select <32 x i1> %x3, <32 x i32> %i, <32 x i32> <i32 255, i32
>> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>> i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
>> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>> i32 255, i32 255, i32 255, i32 255, i32 255>
>> +  %x6 = trunc <32 x i32> %x5 to <32 x i8>
>> +  ret <32 x i8> %x6
>> +}
>> +
>> +define void @usat_trunc_db_1024_mem(<32 x i32> %i, <32 x i8>* %p) {
>> +; KNL-LABEL: usat_trunc_db_1024_mem:
>> +; KNL:       ## BB#0:
>> +; KNL-NEXT:    vpmovusdb %zmm0, %xmm0
>> +; KNL-NEXT:    vpmovusdb %zmm1, %xmm1
>> +; KNL-NEXT:    vinserti128 $1, %xmm1, %ymm0, %ymm0
>> +; KNL-NEXT:    vmovdqu %ymm0, (%rdi)
>> +; KNL-NEXT:    retq
>> +;
>> +; SKX-LABEL: usat_trunc_db_1024_mem:
>> +; SKX:       ## BB#0:
>> +; SKX-NEXT:    vpbroadcastd {{.*}}(%rip), %zmm2
>> +; SKX-NEXT:    vpminud %zmm2, %zmm1, %zmm1
>> +; SKX-NEXT:    vpminud %zmm2, %zmm0, %zmm0
>> +; SKX-NEXT:    vpmovdw %zmm0, %ymm0
>> +; SKX-NEXT:    vpmovdw %zmm1, %ymm1
>> +; SKX-NEXT:    vinserti64x4 $1, %ymm1, %zmm0, %zmm0
>> +; SKX-NEXT:    vpmovwb %zmm0, (%rdi)
>> +; SKX-NEXT:    retq
>> +  %x3 = icmp ult <32 x i32> %i, <i32 255, i32 255, i32 255, i32 255, i32
>> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>> i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
>> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>> i32 255, i32 255>
>> +  %x5 = select <32 x i1> %x3, <32 x i32> %i, <32 x i32> <i32 255, i32
>> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>> i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
>> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>> i32 255, i32 255, i32 255, i32 255, i32 255>
>> +  %x6 = trunc <32 x i32> %x5 to <32 x i8>
>> +  store <32 x i8>%x6, <32 x i8>* %p, align 1
>> +  ret void
>> +}
>> +
>> +define <16 x i16> @usat_trunc_dw_512(<16 x i32> %i) {
>> +; ALL-LABEL: usat_trunc_dw_512:
>> +; ALL:       ## BB#0:
>> +; ALL-NEXT:    vpmovusdw %zmm0, %ymm0
>> +; ALL-NEXT:    retq
>> +  %x3 = icmp ult <16 x i32> %i, <i32 65535, i32 65535, i32 65535, i32
>> 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32
>> 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
>> +  %x5 = select <16 x i1> %x3, <16 x i32> %i, <16 x i32> <i32 65535, i32
>> 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32
>> 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32
>> 65535>
>> +  %x6 = trunc <16 x i32> %x5 to <16 x i16>
>> +  ret <16 x i16> %x6
>> +}
>> +
>> +define <8 x i8> @usat_trunc_wb_128(<8 x i16> %i) {
>> +; ALL-LABEL: usat_trunc_wb_128:
>> +; ALL:       ## BB#0:
>> +; ALL-NEXT:    vpminuw {{.*}}(%rip), %xmm0, %xmm0
>> +; ALL-NEXT:    retq
>> +  %x3 = icmp ult <8 x i16> %i, <i16 255, i16 255, i16 255, i16 255, i16
>> 255, i16 255, i16 255, i16 255>
>> +  %x5 = select <8 x i1> %x3, <8 x i16> %i, <8 x i16> <i16 255, i16 255,
>> i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
>> +  %x6 = trunc <8 x i16> %x5 to <8 x i8>
>> +  ret <8 x i8>%x6
>> +}
>> +
>> +define <16 x i16> @usat_trunc_qw_1024(<16 x i64> %i) {
>> +; KNL-LABEL: usat_trunc_qw_1024:
>> +; KNL:       ## BB#0:
>> +; KNL-NEXT:    vpbroadcastq {{.*}}(%rip), %zmm2
>> +; KNL-NEXT:    vpminuq %zmm2, %zmm1, %zmm1
>> +; KNL-NEXT:    vpminuq %zmm2, %zmm0, %zmm0
>> +; KNL-NEXT:    vpmovqd %zmm0, %ymm0
>> +; KNL-NEXT:    vpmovqd %zmm1, %ymm1
>> +; KNL-NEXT:    vinserti64x4 $1, %ymm1, %zmm0, %zmm0
>> +; KNL-NEXT:    vpmovdw %zmm0, %ymm0
>> +; KNL-NEXT:    retq
>> +;
>> +; SKX-LABEL: usat_trunc_qw_1024:
>> +; SKX:       ## BB#0:
>> +; SKX-NEXT:    vpbroadcastq {{.*}}(%rip), %zmm2
>> +; SKX-NEXT:    vpminuq %zmm2, %zmm1, %zmm1
>> +; SKX-NEXT:    vpminuq %zmm2, %zmm0, %zmm0
>> +; SKX-NEXT:    vpmovqd %zmm0, %ymm0
>> +; SKX-NEXT:    vpmovqd %zmm1, %ymm1
>> +; SKX-NEXT:    vinserti32x8 $1, %ymm1, %zmm0, %zmm0
>> +; SKX-NEXT:    vpmovdw %zmm0, %ymm0
>> +; SKX-NEXT:    retq
>> +  %x3 = icmp ult <16 x i64> %i, <i64 65535, i64 65535, i64 65535, i64
>> 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64
>> 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535>
>> +  %x5 = select <16 x i1> %x3, <16 x i64> %i, <16 x i64> <i64 65535, i64
>> 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64
>> 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64
>> 65535>
>> +  %x6 = trunc <16 x i64> %x5 to <16 x i16>
>> +  ret <16 x i16> %x6
>> +}
>> +
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>>
>>
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>
>


More information about the llvm-commits mailing list