[llvm] r291670 - X86 CodeGen: Optimized pattern for truncate with unsigned saturation.
Philip Reames via llvm-commits
llvm-commits at lists.llvm.org
Thu Jan 19 14:36:56 PST 2017
Michael, you clearly did the right thing here. Reverting a patch which
is broken is absolutely appropriate and expected.
Elana, if you have an internal failure that you can reduce down to a
reproducer for an upstream commit, please do revert the change, file a
bug, and reply to the commit thread with a link to that pr.
Philip
On 01/19/2017 10:47 AM, Michael Kuperstein via llvm-commits wrote:
> Hi Elena,
>
> Thanks for the fix.
>
> Regarding the revert - in this case, we're talking about:
>
> 1) A recent commit,
> 2) that has nothing else layered on top of it (except for whitespace
> changes)
> 3) is a performance improvement that causes a correctness regression,
> 4) the crasher is reduced from real code, not a synthetic test-case,
> 5) and has a small IR reproducer.
>
> I really think that in such cases it's worth keeping trunk clean, at
> the cost of the original commiter having to reverse-merge the revert
> before fixing the bug.
>
> Thanks,
> Michael
>
> On Thu, Jan 19, 2017 at 4:49 AM, Demikhovsky, Elena
> <elena.demikhovsky at intel.com <mailto:elena.demikhovsky at intel.com>> wrote:
>
> Fixed and recommitted in r292479.
>
> I’d prefer that you’ll not revert the failing commit, but wait for
> a few days. It will be easier for me to fix.
>
> (If it is not a buildbot failure, of course. But these failures I
> can see myself)
>
> We also find regressions in our internal testing from time to
> time, PR31671, for example. We submit a PR, notify the owner, and
> let him to fix the bug.
>
> Thanks.
>
> -*/ Elena/*
>
> *From:*Michael Kuperstein [mailto:mkuper at google.com
> <mailto:mkuper at google.com>]
> *Sent:* Thursday, January 19, 2017 01:19
> *To:* Demikhovsky, Elena <elena.demikhovsky at intel.com
> <mailto:elena.demikhovsky at intel.com>>
> *Cc:* llvm-commits <llvm-commits at lists.llvm.org
> <mailto:llvm-commits at lists.llvm.org>>
> *Subject:* Re: [llvm] r291670 - X86 CodeGen: Optimized pattern for
> truncate with unsigned saturation.
>
> Hi Elena,
>
> This still crashes in more complex cases. I've reverted in
> r292444, see PR31589 for the reproducer.
>
> Thanks,
>
> Michael
>
> On Wed, Jan 11, 2017 at 4:59 AM, Elena Demikhovsky via
> llvm-commits <llvm-commits at lists.llvm.org
> <mailto:llvm-commits at lists.llvm.org>> wrote:
>
> Author: delena
> Date: Wed Jan 11 06:59:32 2017
> New Revision: 291670
>
> URL: http://llvm.org/viewvc/llvm-project?rev=291670&view=rev
> <http://llvm.org/viewvc/llvm-project?rev=291670&view=rev>
> Log:
> X86 CodeGen: Optimized pattern for truncate with unsigned
> saturation.
>
> DAG patterns optimization: truncate + unsigned saturation
> supported by VPMOVUS* instructions in AVX-512.
> And VPACKUS* instructions on SEE* targets.
>
> Differential Revision: https://reviews.llvm.org/D28216
> <https://reviews.llvm.org/D28216>
>
>
> Modified:
> llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> llvm/trunk/test/CodeGen/X86/avx-trunc.ll
> llvm/trunk/test/CodeGen/X86/avx512-trunc.ll
>
> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=291670&r1=291669&r2=291670&view=diff
> <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=291670&r1=291669&r2=291670&view=diff>
> ==============================================================================
> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Wed Jan 11
> 06:59:32 2017
> @@ -31220,6 +31220,93 @@ static SDValue foldVectorXorShiftIntoCmp
> return DAG.getNode(X86ISD::PCMPGT, SDLoc(N), VT,
> Shift.getOperand(0), Ones);
> }
>
> +/// Check if truncation with saturation form type \p SrcVT to
> \p DstVT
> +/// is valid for the given \p Subtarget.
> +static bool isSATValidOnAVX512Subtarget(EVT SrcVT, EVT DstVT,
> + const X86Subtarget &Subtarget) {
> + if (!Subtarget.hasAVX512())
> + return false;
> +
> + // FIXME: Scalar type may be supported if we move it to
> vector register.
> + if (!SrcVT.isVector() || !SrcVT.isSimple() ||
> SrcVT.getSizeInBits() > 512)
> + return false;
> +
> + EVT SrcElVT = SrcVT.getScalarType();
> + EVT DstElVT = DstVT.getScalarType();
> + if (SrcElVT.getSizeInBits() < 16 || SrcElVT.getSizeInBits()
> > 64)
> + return false;
> + if (DstElVT.getSizeInBits() < 8 || DstElVT.getSizeInBits()
> > 32)
> + return false;
> + if (SrcVT.is512BitVector() || Subtarget.hasVLX())
> + return SrcElVT.getSizeInBits() >= 32 || Subtarget.hasBWI();
> + return false;
> +}
> +
> +/// Return true if VPACK* instruction can be used for the
> given types
> +/// and it is avalable on \p Subtarget.
> +static bool
> +isSATValidOnSSESubtarget(EVT SrcVT, EVT DstVT, const
> X86Subtarget &Subtarget) {
> + if (Subtarget.hasSSE2())
> + // v16i16 -> v16i8
> + if (SrcVT == MVT::v16i16 && DstVT == MVT::v16i8)
> + return true;
> + if (Subtarget.hasSSE41())
> + // v8i32 -> v8i16
> + if (SrcVT == MVT::v8i32 && DstVT == MVT::v8i16)
> + return true;
> + return false;
> +}
> +
> +/// Detect a pattern of truncation with saturation:
> +/// (truncate (umin (x, unsigned_max_of_dest_type)) to
> dest_type).
> +/// Return the source value to be truncated or SDValue() if
> the pattern was not
> +/// matched.
> +static SDValue detectUSatPattern(SDValue In, EVT VT) {
> + if (In.getOpcode() != ISD::UMIN)
> + return SDValue();
> +
> + //Saturation with truncation. We truncate from InVT to VT.
> + assert(In.getScalarValueSizeInBits() >
> VT.getScalarSizeInBits() &&
> + "Unexpected types for truncate operation");
> +
> + APInt C;
> + if (ISD::isConstantSplatVector(In.getOperand(1).getNode(),
> C)) {
> + // C should be equal to UINT32_MAX / UINT16_MAX /
> UINT8_MAX according
> + // the element size of the destination type.
> + return APIntOps::isMask(VT.getScalarSizeInBits(), C) ?
> In.getOperand(0) :
> + SDValue();
> + }
> + return SDValue();
> +}
> +
> +/// Detect a pattern of truncation with saturation:
> +/// (truncate (umin (x, unsigned_max_of_dest_type)) to
> dest_type).
> +/// The types should allow to use VPMOVUS* instruction on AVX512.
> +/// Return the source value to be truncated or SDValue() if
> the pattern was not
> +/// matched.
> +static SDValue detectAVX512USatPattern(SDValue In, EVT VT,
> + const X86Subtarget &Subtarget) {
> + if (!isSATValidOnAVX512Subtarget(In.getValueType(), VT,
> Subtarget))
> + return SDValue();
> + return detectUSatPattern(In, VT);
> +}
> +
> +static SDValue
> +combineTruncateWithUSat(SDValue In, EVT VT, SDLoc &DL,
> SelectionDAG &DAG,
> + const X86Subtarget &Subtarget) {
> + SDValue USatVal = detectUSatPattern(In, VT);
> + if (USatVal) {
> + if (isSATValidOnAVX512Subtarget(In.getValueType(), VT,
> Subtarget))
> + return DAG.getNode(X86ISD::VTRUNCUS, DL, VT, USatVal);
> + if (isSATValidOnSSESubtarget(In.getValueType(), VT,
> Subtarget)) {
> + SDValue Lo, Hi;
> + std::tie(Lo, Hi) = DAG.SplitVector(USatVal, DL);
> + return DAG.getNode(X86ISD::PACKUS, DL, VT, Lo, Hi);
> + }
> + }
> + return SDValue();
> +}
> +
> /// This function detects the AVG pattern between vectors of
> unsigned i8/i16,
> /// which is c = (a + b + 1) / 2, and replace this operation
> with the efficient
> /// X86ISD::AVG instruction.
> @@ -31786,6 +31873,12 @@ static SDValue combineStore(SDNode *N, S
> St->getPointerInfo(), St->getAlignment(),
> St->getMemOperand()->getFlags());
>
> + if (SDValue Val =
> + detectAVX512USatPattern(St->getValue(),
> St->getMemoryVT(), Subtarget))
> + return EmitTruncSStore(false /* Unsigned saturation */,
> St->getChain(),
> + dl, Val, St->getBasePtr(),
> + St->getMemoryVT(), St->getMemOperand(), DAG);
> +
> const TargetLowering &TLI = DAG.getTargetLoweringInfo();
> unsigned NumElems = VT.getVectorNumElements();
> assert(StVT != VT && "Cannot truncate to the same type");
> @@ -32406,6 +32499,10 @@ static SDValue combineTruncate(SDNode *N
> if (SDValue Avg = detectAVGPattern(Src, VT, DAG, Subtarget,
> DL))
> return Avg;
>
> + // Try to combine truncation with unsigned saturation.
> + if (SDValue Val = combineTruncateWithUSat(Src, VT, DL, DAG,
> Subtarget))
> + return Val;
> +
> // The bitcast source is a direct mmx result.
> // Detect bitcasts between i32 to x86mmx
> if (Src.getOpcode() == ISD::BITCAST && VT == MVT::i32) {
>
> Modified: llvm/trunk/test/CodeGen/X86/avx-trunc.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-trunc.ll?rev=291670&r1=291669&r2=291670&view=diff
> <http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-trunc.ll?rev=291670&r1=291669&r2=291670&view=diff>
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/avx-trunc.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/avx-trunc.ll Wed Jan 11
> 06:59:32 2017
> @@ -39,3 +39,29 @@ define <16 x i8> @trunc_16_8(<16 x i16>
> %B = trunc <16 x i16> %A to <16 x i8>
> ret <16 x i8> %B
> }
> +
> +define <16 x i8> @usat_trunc_wb_256(<16 x i16> %i) {
> +; CHECK-LABEL: usat_trunc_wb_256:
> +; CHECK: # BB#0:
> +; CHECK-NEXT: vextractf128 $1, %ymm0, %xmm1
> +; CHECK-NEXT: vpackuswb %xmm1, %xmm0, %xmm0
> +; CHECK-NEXT: vzeroupper
> +; CHECK-NEXT: retq
> + %x3 = icmp ult <16 x i16> %i, <i16 255, i16 255, i16 255,
> i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255,
> i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
> + %x5 = select <16 x i1> %x3, <16 x i16> %i, <16 x i16> <i16
> 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16
> 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16
> 255, i16 255>
> + %x6 = trunc <16 x i16> %x5 to <16 x i8>
> + ret <16 x i8> %x6
> +}
> +
> +define <8 x i16> @usat_trunc_dw_256(<8 x i32> %i) {
> +; CHECK-LABEL: usat_trunc_dw_256:
> +; CHECK: # BB#0:
> +; CHECK-NEXT: vextractf128 $1, %ymm0, %xmm1
> +; CHECK-NEXT: vpackusdw %xmm1, %xmm0, %xmm0
> +; CHECK-NEXT: vzeroupper
> +; CHECK-NEXT: retq
> + %x3 = icmp ult <8 x i32> %i, <i32 65535, i32 65535, i32
> 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
> + %x5 = select <8 x i1> %x3, <8 x i32> %i, <8 x i32> <i32
> 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535,
> i32 65535, i32 65535>
> + %x6 = trunc <8 x i32> %x5 to <8 x i16>
> + ret <8 x i16> %x6
> +}
>
> Modified: llvm/trunk/test/CodeGen/X86/avx512-trunc.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx512-trunc.ll?rev=291670&r1=291669&r2=291670&view=diff
> <http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx512-trunc.ll?rev=291670&r1=291669&r2=291670&view=diff>
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/avx512-trunc.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/avx512-trunc.ll Wed Jan 11
> 06:59:32 2017
> @@ -500,3 +500,208 @@ define void @trunc_wb_128_mem(<8 x i16>
> store <8 x i8> %x, <8 x i8>* %res
> ret void
> }
> +
> +
> +define void @usat_trunc_wb_256_mem(<16 x i16> %i, <16 x i8>*
> %res) {
> +; KNL-LABEL: usat_trunc_wb_256_mem:
> +; KNL: ## BB#0:
> +; KNL-NEXT: vextracti128 $1, %ymm0, %xmm1
> +; KNL-NEXT: vpackuswb %xmm1, %xmm0, %xmm0
> +; KNL-NEXT: vmovdqu %xmm0, (%rdi)
> +; KNL-NEXT: retq
> +;
> +; SKX-LABEL: usat_trunc_wb_256_mem:
> +; SKX: ## BB#0:
> +; SKX-NEXT: vpmovuswb %ymm0, (%rdi)
> +; SKX-NEXT: retq
> + %x3 = icmp ult <16 x i16> %i, <i16 255, i16 255, i16 255,
> i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255,
> i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
> + %x5 = select <16 x i1> %x3, <16 x i16> %i, <16 x i16> <i16
> 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16
> 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16
> 255, i16 255>
> + %x6 = trunc <16 x i16> %x5 to <16 x i8>
> + store <16 x i8> %x6, <16 x i8>* %res, align 1
> + ret void
> +}
> +
> +define <16 x i8> @usat_trunc_wb_256(<16 x i16> %i) {
> +; KNL-LABEL: usat_trunc_wb_256:
> +; KNL: ## BB#0:
> +; KNL-NEXT: vextracti128 $1, %ymm0, %xmm1
> +; KNL-NEXT: vpackuswb %xmm1, %xmm0, %xmm0
> +; KNL-NEXT: retq
> +;
> +; SKX-LABEL: usat_trunc_wb_256:
> +; SKX: ## BB#0:
> +; SKX-NEXT: vpmovuswb %ymm0, %xmm0
> +; SKX-NEXT: retq
> + %x3 = icmp ult <16 x i16> %i, <i16 255, i16 255, i16 255,
> i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255,
> i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
> + %x5 = select <16 x i1> %x3, <16 x i16> %i, <16 x i16> <i16
> 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16
> 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16
> 255, i16 255>
> + %x6 = trunc <16 x i16> %x5 to <16 x i8>
> + ret <16 x i8> %x6
> +}
> +
> +define void @usat_trunc_wb_128_mem(<8 x i16> %i, <8 x i8>*
> %res) {
> +; KNL-LABEL: usat_trunc_wb_128_mem:
> +; KNL: ## BB#0:
> +; KNL-NEXT: vpminuw {{.*}}(%rip), %xmm0, %xmm0
> +; KNL-NEXT: vpshufb {{.*#+}} xmm0 =
> xmm0[0,2,4,6,8,10,12,14,u,u,u,u,u,u,u,u]
> +; KNL-NEXT: vmovq %xmm0, (%rdi)
> +; KNL-NEXT: retq
> +;
> +; SKX-LABEL: usat_trunc_wb_128_mem:
> +; SKX: ## BB#0:
> +; SKX-NEXT: vpmovuswb %xmm0, (%rdi)
> +; SKX-NEXT: retq
> + %x3 = icmp ult <8 x i16> %i, <i16 255, i16 255, i16 255,
> i16 255, i16 255, i16 255, i16 255, i16 255>
> + %x5 = select <8 x i1> %x3, <8 x i16> %i, <8 x i16> <i16
> 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16
> 255>
> + %x6 = trunc <8 x i16> %x5 to <8 x i8>
> + store <8 x i8> %x6, <8 x i8>* %res, align 1
> + ret void
> +}
> +
> +define void @usat_trunc_db_512_mem(<16 x i32> %i, <16 x i8>*
> %res) {
> +; ALL-LABEL: usat_trunc_db_512_mem:
> +; ALL: ## BB#0:
> +; ALL-NEXT: vpmovusdb %zmm0, (%rdi)
> +; ALL-NEXT: retq
> + %x3 = icmp ult <16 x i32> %i, <i32 255, i32 255, i32 255,
> i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
> i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>
> + %x5 = select <16 x i1> %x3, <16 x i32> %i, <16 x i32> <i32
> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
> 255, i32 255>
> + %x6 = trunc <16 x i32> %x5 to <16 x i8>
> + store <16 x i8> %x6, <16 x i8>* %res, align 1
> + ret void
> +}
> +
> +define void @usat_trunc_qb_512_mem(<8 x i64> %i, <8 x i8>*
> %res) {
> +; ALL-LABEL: usat_trunc_qb_512_mem:
> +; ALL: ## BB#0:
> +; ALL-NEXT: vpmovusqb %zmm0, (%rdi)
> +; ALL-NEXT: retq
> + %x3 = icmp ult <8 x i64> %i, <i64 255, i64 255, i64 255,
> i64 255, i64 255, i64 255, i64 255, i64 255>
> + %x5 = select <8 x i1> %x3, <8 x i64> %i, <8 x i64> <i64
> 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64 255, i64
> 255>
> + %x6 = trunc <8 x i64> %x5 to <8 x i8>
> + store <8 x i8> %x6, <8 x i8>* %res, align 1
> + ret void
> +}
> +
> +define void @usat_trunc_qd_512_mem(<8 x i64> %i, <8 x i32>*
> %res) {
> +; ALL-LABEL: usat_trunc_qd_512_mem:
> +; ALL: ## BB#0:
> +; ALL-NEXT: vpmovusqd %zmm0, (%rdi)
> +; ALL-NEXT: retq
> + %x3 = icmp ult <8 x i64> %i, <i64 4294967295, i64
> 4294967295, i64 4294967295, i64 4294967295, i64 4294967295,
> i64 4294967295, i64 4294967295, i64 4294967295>
> + %x5 = select <8 x i1> %x3, <8 x i64> %i, <8 x i64> <i64
> 4294967295, i64 4294967295, i64 4294967295, i64 4294967295,
> i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295>
> + %x6 = trunc <8 x i64> %x5 to <8 x i32>
> + store <8 x i32> %x6, <8 x i32>* %res, align 1
> + ret void
> +}
> +
> +define void @usat_trunc_qw_512_mem(<8 x i64> %i, <8 x i16>*
> %res) {
> +; ALL-LABEL: usat_trunc_qw_512_mem:
> +; ALL: ## BB#0:
> +; ALL-NEXT: vpmovusqw %zmm0, (%rdi)
> +; ALL-NEXT: retq
> + %x3 = icmp ult <8 x i64> %i, <i64 65535, i64 65535, i64
> 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535>
> + %x5 = select <8 x i1> %x3, <8 x i64> %i, <8 x i64> <i64
> 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535,
> i64 65535, i64 65535>
> + %x6 = trunc <8 x i64> %x5 to <8 x i16>
> + store <8 x i16> %x6, <8 x i16>* %res, align 1
> + ret void
> +}
> +
> +define <32 x i8> @usat_trunc_db_1024(<32 x i32> %i) {
> +; KNL-LABEL: usat_trunc_db_1024:
> +; KNL: ## BB#0:
> +; KNL-NEXT: vpmovusdb %zmm0, %xmm0
> +; KNL-NEXT: vpmovusdb %zmm1, %xmm1
> +; KNL-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0
> +; KNL-NEXT: retq
> +;
> +; SKX-LABEL: usat_trunc_db_1024:
> +; SKX: ## BB#0:
> +; SKX-NEXT: vpbroadcastd {{.*}}(%rip), %zmm2
> +; SKX-NEXT: vpminud %zmm2, %zmm1, %zmm1
> +; SKX-NEXT: vpminud %zmm2, %zmm0, %zmm0
> +; SKX-NEXT: vpmovdw %zmm0, %ymm0
> +; SKX-NEXT: vpmovdw %zmm1, %ymm1
> +; SKX-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
> +; SKX-NEXT: vpmovwb %zmm0, %ymm0
> +; SKX-NEXT: retq
> + %x3 = icmp ult <32 x i32> %i, <i32 255, i32 255, i32 255,
> i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
> i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
> i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
> i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
> i32 255>
> + %x5 = select <32 x i1> %x3, <32 x i32> %i, <32 x i32> <i32
> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
> 255, i32 255, i32 255, i32 255>
> + %x6 = trunc <32 x i32> %x5 to <32 x i8>
> + ret <32 x i8> %x6
> +}
> +
> +define void @usat_trunc_db_1024_mem(<32 x i32> %i, <32 x i8>*
> %p) {
> +; KNL-LABEL: usat_trunc_db_1024_mem:
> +; KNL: ## BB#0:
> +; KNL-NEXT: vpmovusdb %zmm0, %xmm0
> +; KNL-NEXT: vpmovusdb %zmm1, %xmm1
> +; KNL-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0
> +; KNL-NEXT: vmovdqu %ymm0, (%rdi)
> +; KNL-NEXT: retq
> +;
> +; SKX-LABEL: usat_trunc_db_1024_mem:
> +; SKX: ## BB#0:
> +; SKX-NEXT: vpbroadcastd {{.*}}(%rip), %zmm2
> +; SKX-NEXT: vpminud %zmm2, %zmm1, %zmm1
> +; SKX-NEXT: vpminud %zmm2, %zmm0, %zmm0
> +; SKX-NEXT: vpmovdw %zmm0, %ymm0
> +; SKX-NEXT: vpmovdw %zmm1, %ymm1
> +; SKX-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
> +; SKX-NEXT: vpmovwb %zmm0, (%rdi)
> +; SKX-NEXT: retq
> + %x3 = icmp ult <32 x i32> %i, <i32 255, i32 255, i32 255,
> i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
> i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
> i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
> i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
> i32 255>
> + %x5 = select <32 x i1> %x3, <32 x i32> %i, <32 x i32> <i32
> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
> 255, i32 255, i32 255, i32 255>
> + %x6 = trunc <32 x i32> %x5 to <32 x i8>
> + store <32 x i8>%x6, <32 x i8>* %p, align 1
> + ret void
> +}
> +
> +define <16 x i16> @usat_trunc_dw_512(<16 x i32> %i) {
> +; ALL-LABEL: usat_trunc_dw_512:
> +; ALL: ## BB#0:
> +; ALL-NEXT: vpmovusdw %zmm0, %ymm0
> +; ALL-NEXT: retq
> + %x3 = icmp ult <16 x i32> %i, <i32 65535, i32 65535, i32
> 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535,
> i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32
> 65535, i32 65535, i32 65535>
> + %x5 = select <16 x i1> %x3, <16 x i32> %i, <16 x i32> <i32
> 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535,
> i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32
> 65535, i32 65535, i32 65535, i32 65535, i32 65535>
> + %x6 = trunc <16 x i32> %x5 to <16 x i16>
> + ret <16 x i16> %x6
> +}
> +
> +define <8 x i8> @usat_trunc_wb_128(<8 x i16> %i) {
> +; ALL-LABEL: usat_trunc_wb_128:
> +; ALL: ## BB#0:
> +; ALL-NEXT: vpminuw {{.*}}(%rip), %xmm0, %xmm0
> +; ALL-NEXT: retq
> + %x3 = icmp ult <8 x i16> %i, <i16 255, i16 255, i16 255,
> i16 255, i16 255, i16 255, i16 255, i16 255>
> + %x5 = select <8 x i1> %x3, <8 x i16> %i, <8 x i16> <i16
> 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16
> 255>
> + %x6 = trunc <8 x i16> %x5 to <8 x i8>
> + ret <8 x i8>%x6
> +}
> +
> +define <16 x i16> @usat_trunc_qw_1024(<16 x i64> %i) {
> +; KNL-LABEL: usat_trunc_qw_1024:
> +; KNL: ## BB#0:
> +; KNL-NEXT: vpbroadcastq {{.*}}(%rip), %zmm2
> +; KNL-NEXT: vpminuq %zmm2, %zmm1, %zmm1
> +; KNL-NEXT: vpminuq %zmm2, %zmm0, %zmm0
> +; KNL-NEXT: vpmovqd %zmm0, %ymm0
> +; KNL-NEXT: vpmovqd %zmm1, %ymm1
> +; KNL-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
> +; KNL-NEXT: vpmovdw %zmm0, %ymm0
> +; KNL-NEXT: retq
> +;
> +; SKX-LABEL: usat_trunc_qw_1024:
> +; SKX: ## BB#0:
> +; SKX-NEXT: vpbroadcastq {{.*}}(%rip), %zmm2
> +; SKX-NEXT: vpminuq %zmm2, %zmm1, %zmm1
> +; SKX-NEXT: vpminuq %zmm2, %zmm0, %zmm0
> +; SKX-NEXT: vpmovqd %zmm0, %ymm0
> +; SKX-NEXT: vpmovqd %zmm1, %ymm1
> +; SKX-NEXT: vinserti32x8 $1, %ymm1, %zmm0, %zmm0
> +; SKX-NEXT: vpmovdw %zmm0, %ymm0
> +; SKX-NEXT: retq
> + %x3 = icmp ult <16 x i64> %i, <i64 65535, i64 65535, i64
> 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535,
> i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64
> 65535, i64 65535, i64 65535>
> + %x5 = select <16 x i1> %x3, <16 x i64> %i, <16 x i64> <i64
> 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535,
> i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64
> 65535, i64 65535, i64 65535, i64 65535, i64 65535>
> + %x6 = trunc <16 x i64> %x5 to <16 x i16>
> + ret <16 x i16> %x6
> +}
> +
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170119/9efb1dd5/attachment.html>
More information about the llvm-commits
mailing list