[llvm] r291670 - X86 CodeGen: Optimized pattern for truncate with unsigned saturation.
Hans Wennborg via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 25 09:08:59 PST 2017
It seems safer to merge the revert. I went ahead and merged in r293070.
Thanks,
Hans
On Tue, Jan 24, 2017 at 1:23 PM, Michael Kuperstein <mkuper at google.com> wrote:
> The branch name is release_40, AFAIK.
> As to revert vs. fix - I defer to Hans / Craig.
>
> Michael
>
> On Tue, Jan 24, 2017 at 11:29 AM, Demikhovsky, Elena
> <elena.demikhovsky at intel.com> wrote:
>>
>> I propose to fix, it will be easier than revert. But it’s not so
>> important, Hans’s decision.
>>
>> I’ll need to know the branch name in order to make changes.
>>
>>
>>
>> - Elena
>>
>>
>>
>> From: Michael Kuperstein [mailto:mkuper at google.com]
>> Sent: Tuesday, January 24, 2017 21:09
>> To: Demikhovsky, Elena <elena.demikhovsky at intel.com>; Hans Wennborg
>> <hans at chromium.org>
>>
>>
>> Cc: llvm-commits <llvm-commits at lists.llvm.org>
>> Subject: Re: [llvm] r291670 - X86 CodeGen: Optimized pattern for truncate
>> with unsigned saturation.
>>
>>
>>
>> I just noticed the original commit is before the 4.0 branch-point, but my
>> revert was after.
>>
>> Do we want to revert r291670 in 4.0? Or merge the fixed version (r292479)?
>>
>>
>>
>> Hans, Elena?
>>
>>
>>
>> On Thu, Jan 19, 2017 at 10:47 AM, Michael Kuperstein <mkuper at google.com>
>> wrote:
>>
>> Hi Elena,
>>
>>
>>
>> Thanks for the fix.
>>
>>
>>
>> Regarding the revert - in this case, we're talking about:
>>
>>
>>
>> 1) A recent commit,
>>
>> 2) that has nothing else layered on top of it (except for whitespace
>> changes)
>>
>> 3) is a performance improvement that causes a correctness regression,
>>
>> 4) the crasher is reduced from real code, not a synthetic test-case,
>>
>> 5) and has a small IR reproducer.
>>
>>
>>
>> I really think that in such cases it's worth keeping trunk clean, at the
>> cost of the original commiter having to reverse-merge the revert before
>> fixing the bug.
>>
>>
>>
>> Thanks,
>>
>> Michael
>>
>>
>>
>> On Thu, Jan 19, 2017 at 4:49 AM, Demikhovsky, Elena
>> <elena.demikhovsky at intel.com> wrote:
>>
>> Fixed and recommitted in r292479.
>>
>>
>>
>> I’d prefer that you’ll not revert the failing commit, but wait for a few
>> days. It will be easier for me to fix.
>>
>> (If it is not a buildbot failure, of course. But these failures I can see
>> myself)
>>
>> We also find regressions in our internal testing from time to time,
>> PR31671, for example. We submit a PR, notify the owner, and let him to fix
>> the bug.
>>
>>
>>
>> Thanks.
>>
>>
>>
>> - Elena
>>
>>
>>
>> From: Michael Kuperstein [mailto:mkuper at google.com]
>> Sent: Thursday, January 19, 2017 01:19
>> To: Demikhovsky, Elena <elena.demikhovsky at intel.com>
>> Cc: llvm-commits <llvm-commits at lists.llvm.org>
>> Subject: Re: [llvm] r291670 - X86 CodeGen: Optimized pattern for truncate
>> with unsigned saturation.
>>
>>
>>
>> Hi Elena,
>>
>>
>>
>> This still crashes in more complex cases. I've reverted in r292444, see
>> PR31589 for the reproducer.
>>
>>
>>
>> Thanks,
>>
>> Michael
>>
>>
>>
>> On Wed, Jan 11, 2017 at 4:59 AM, Elena Demikhovsky via llvm-commits
>> <llvm-commits at lists.llvm.org> wrote:
>>
>> Author: delena
>> Date: Wed Jan 11 06:59:32 2017
>> New Revision: 291670
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=291670&view=rev
>> Log:
>> X86 CodeGen: Optimized pattern for truncate with unsigned saturation.
>>
>> DAG patterns optimization: truncate + unsigned saturation supported by
>> VPMOVUS* instructions in AVX-512.
>> And VPACKUS* instructions on SEE* targets.
>>
>> Differential Revision: https://reviews.llvm.org/D28216
>>
>>
>> Modified:
>> llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>> llvm/trunk/test/CodeGen/X86/avx-trunc.ll
>> llvm/trunk/test/CodeGen/X86/avx512-trunc.ll
>>
>> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=291670&r1=291669&r2=291670&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
>> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Wed Jan 11 06:59:32 2017
>> @@ -31220,6 +31220,93 @@ static SDValue foldVectorXorShiftIntoCmp
>> return DAG.getNode(X86ISD::PCMPGT, SDLoc(N), VT, Shift.getOperand(0),
>> Ones);
>> }
>>
>> +/// Check if truncation with saturation form type \p SrcVT to \p DstVT
>> +/// is valid for the given \p Subtarget.
>> +static bool isSATValidOnAVX512Subtarget(EVT SrcVT, EVT DstVT,
>> + const X86Subtarget &Subtarget) {
>> + if (!Subtarget.hasAVX512())
>> + return false;
>> +
>> + // FIXME: Scalar type may be supported if we move it to vector
>> register.
>> + if (!SrcVT.isVector() || !SrcVT.isSimple() || SrcVT.getSizeInBits() >
>> 512)
>> + return false;
>> +
>> + EVT SrcElVT = SrcVT.getScalarType();
>> + EVT DstElVT = DstVT.getScalarType();
>> + if (SrcElVT.getSizeInBits() < 16 || SrcElVT.getSizeInBits() > 64)
>> + return false;
>> + if (DstElVT.getSizeInBits() < 8 || DstElVT.getSizeInBits() > 32)
>> + return false;
>> + if (SrcVT.is512BitVector() || Subtarget.hasVLX())
>> + return SrcElVT.getSizeInBits() >= 32 || Subtarget.hasBWI();
>> + return false;
>> +}
>> +
>> +/// Return true if VPACK* instruction can be used for the given types
>> +/// and it is avalable on \p Subtarget.
>> +static bool
>> +isSATValidOnSSESubtarget(EVT SrcVT, EVT DstVT, const X86Subtarget
>> &Subtarget) {
>> + if (Subtarget.hasSSE2())
>> + // v16i16 -> v16i8
>> + if (SrcVT == MVT::v16i16 && DstVT == MVT::v16i8)
>> + return true;
>> + if (Subtarget.hasSSE41())
>> + // v8i32 -> v8i16
>> + if (SrcVT == MVT::v8i32 && DstVT == MVT::v8i16)
>> + return true;
>> + return false;
>> +}
>> +
>> +/// Detect a pattern of truncation with saturation:
>> +/// (truncate (umin (x, unsigned_max_of_dest_type)) to dest_type).
>> +/// Return the source value to be truncated or SDValue() if the pattern
>> was not
>> +/// matched.
>> +static SDValue detectUSatPattern(SDValue In, EVT VT) {
>> + if (In.getOpcode() != ISD::UMIN)
>> + return SDValue();
>> +
>> + //Saturation with truncation. We truncate from InVT to VT.
>> + assert(In.getScalarValueSizeInBits() > VT.getScalarSizeInBits() &&
>> + "Unexpected types for truncate operation");
>> +
>> + APInt C;
>> + if (ISD::isConstantSplatVector(In.getOperand(1).getNode(), C)) {
>> + // C should be equal to UINT32_MAX / UINT16_MAX / UINT8_MAX according
>> + // the element size of the destination type.
>> + return APIntOps::isMask(VT.getScalarSizeInBits(), C) ?
>> In.getOperand(0) :
>> + SDValue();
>> + }
>> + return SDValue();
>> +}
>> +
>> +/// Detect a pattern of truncation with saturation:
>> +/// (truncate (umin (x, unsigned_max_of_dest_type)) to dest_type).
>> +/// The types should allow to use VPMOVUS* instruction on AVX512.
>> +/// Return the source value to be truncated or SDValue() if the pattern
>> was not
>> +/// matched.
>> +static SDValue detectAVX512USatPattern(SDValue In, EVT VT,
>> + const X86Subtarget &Subtarget) {
>> + if (!isSATValidOnAVX512Subtarget(In.getValueType(), VT, Subtarget))
>> + return SDValue();
>> + return detectUSatPattern(In, VT);
>> +}
>> +
>> +static SDValue
>> +combineTruncateWithUSat(SDValue In, EVT VT, SDLoc &DL, SelectionDAG &DAG,
>> + const X86Subtarget &Subtarget) {
>> + SDValue USatVal = detectUSatPattern(In, VT);
>> + if (USatVal) {
>> + if (isSATValidOnAVX512Subtarget(In.getValueType(), VT, Subtarget))
>> + return DAG.getNode(X86ISD::VTRUNCUS, DL, VT, USatVal);
>> + if (isSATValidOnSSESubtarget(In.getValueType(), VT, Subtarget)) {
>> + SDValue Lo, Hi;
>> + std::tie(Lo, Hi) = DAG.SplitVector(USatVal, DL);
>> + return DAG.getNode(X86ISD::PACKUS, DL, VT, Lo, Hi);
>> + }
>> + }
>> + return SDValue();
>> +}
>> +
>> /// This function detects the AVG pattern between vectors of unsigned
>> i8/i16,
>> /// which is c = (a + b + 1) / 2, and replace this operation with the
>> efficient
>> /// X86ISD::AVG instruction.
>> @@ -31786,6 +31873,12 @@ static SDValue combineStore(SDNode *N, S
>> St->getPointerInfo(), St->getAlignment(),
>> St->getMemOperand()->getFlags());
>>
>> + if (SDValue Val =
>> + detectAVX512USatPattern(St->getValue(), St->getMemoryVT(),
>> Subtarget))
>> + return EmitTruncSStore(false /* Unsigned saturation */,
>> St->getChain(),
>> + dl, Val, St->getBasePtr(),
>> + St->getMemoryVT(), St->getMemOperand(),
>> DAG);
>> +
>> const TargetLowering &TLI = DAG.getTargetLoweringInfo();
>> unsigned NumElems = VT.getVectorNumElements();
>> assert(StVT != VT && "Cannot truncate to the same type");
>> @@ -32406,6 +32499,10 @@ static SDValue combineTruncate(SDNode *N
>> if (SDValue Avg = detectAVGPattern(Src, VT, DAG, Subtarget, DL))
>> return Avg;
>>
>> + // Try to combine truncation with unsigned saturation.
>> + if (SDValue Val = combineTruncateWithUSat(Src, VT, DL, DAG, Subtarget))
>> + return Val;
>> +
>> // The bitcast source is a direct mmx result.
>> // Detect bitcasts between i32 to x86mmx
>> if (Src.getOpcode() == ISD::BITCAST && VT == MVT::i32) {
>>
>> Modified: llvm/trunk/test/CodeGen/X86/avx-trunc.ll
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-trunc.ll?rev=291670&r1=291669&r2=291670&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/X86/avx-trunc.ll (original)
>> +++ llvm/trunk/test/CodeGen/X86/avx-trunc.ll Wed Jan 11 06:59:32 2017
>> @@ -39,3 +39,29 @@ define <16 x i8> @trunc_16_8(<16 x i16>
>> %B = trunc <16 x i16> %A to <16 x i8>
>> ret <16 x i8> %B
>> }
>> +
>> +define <16 x i8> @usat_trunc_wb_256(<16 x i16> %i) {
>> +; CHECK-LABEL: usat_trunc_wb_256:
>> +; CHECK: # BB#0:
>> +; CHECK-NEXT: vextractf128 $1, %ymm0, %xmm1
>> +; CHECK-NEXT: vpackuswb %xmm1, %xmm0, %xmm0
>> +; CHECK-NEXT: vzeroupper
>> +; CHECK-NEXT: retq
>> + %x3 = icmp ult <16 x i16> %i, <i16 255, i16 255, i16 255, i16 255, i16
>> 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255,
>> i16 255, i16 255, i16 255>
>> + %x5 = select <16 x i1> %x3, <16 x i16> %i, <16 x i16> <i16 255, i16
>> 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255,
>> i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
>> + %x6 = trunc <16 x i16> %x5 to <16 x i8>
>> + ret <16 x i8> %x6
>> +}
>> +
>> +define <8 x i16> @usat_trunc_dw_256(<8 x i32> %i) {
>> +; CHECK-LABEL: usat_trunc_dw_256:
>> +; CHECK: # BB#0:
>> +; CHECK-NEXT: vextractf128 $1, %ymm0, %xmm1
>> +; CHECK-NEXT: vpackusdw %xmm1, %xmm0, %xmm0
>> +; CHECK-NEXT: vzeroupper
>> +; CHECK-NEXT: retq
>> + %x3 = icmp ult <8 x i32> %i, <i32 65535, i32 65535, i32 65535, i32
>> 65535, i32 65535, i32 65535, i32 65535, i32 65535>
>> + %x5 = select <8 x i1> %x3, <8 x i32> %i, <8 x i32> <i32 65535, i32
>> 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
>> + %x6 = trunc <8 x i32> %x5 to <8 x i16>
>> + ret <8 x i16> %x6
>> +}
>>
>> Modified: llvm/trunk/test/CodeGen/X86/avx512-trunc.ll
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx512-trunc.ll?rev=291670&r1=291669&r2=291670&view=diff
>>
>> ==============================================================================
>> --- llvm/trunk/test/CodeGen/X86/avx512-trunc.ll (original)
>> +++ llvm/trunk/test/CodeGen/X86/avx512-trunc.ll Wed Jan 11 06:59:32 2017
>> @@ -500,3 +500,208 @@ define void @trunc_wb_128_mem(<8 x i16>
>> store <8 x i8> %x, <8 x i8>* %res
>> ret void
>> }
>> +
>> +
>> +define void @usat_trunc_wb_256_mem(<16 x i16> %i, <16 x i8>* %res) {
>> +; KNL-LABEL: usat_trunc_wb_256_mem:
>> +; KNL: ## BB#0:
>> +; KNL-NEXT: vextracti128 $1, %ymm0, %xmm1
>> +; KNL-NEXT: vpackuswb %xmm1, %xmm0, %xmm0
>> +; KNL-NEXT: vmovdqu %xmm0, (%rdi)
>> +; KNL-NEXT: retq
>> +;
>> +; SKX-LABEL: usat_trunc_wb_256_mem:
>> +; SKX: ## BB#0:
>> +; SKX-NEXT: vpmovuswb %ymm0, (%rdi)
>> +; SKX-NEXT: retq
>> + %x3 = icmp ult <16 x i16> %i, <i16 255, i16 255, i16 255, i16 255, i16
>> 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255,
>> i16 255, i16 255, i16 255>
>> + %x5 = select <16 x i1> %x3, <16 x i16> %i, <16 x i16> <i16 255, i16
>> 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255,
>> i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
>> + %x6 = trunc <16 x i16> %x5 to <16 x i8>
>> + store <16 x i8> %x6, <16 x i8>* %res, align 1
>> + ret void
>> +}
>> +
>> +define <16 x i8> @usat_trunc_wb_256(<16 x i16> %i) {
>> +; KNL-LABEL: usat_trunc_wb_256:
>> +; KNL: ## BB#0:
>> +; KNL-NEXT: vextracti128 $1, %ymm0, %xmm1
>> +; KNL-NEXT: vpackuswb %xmm1, %xmm0, %xmm0
>> +; KNL-NEXT: retq
>> +;
>> +; SKX-LABEL: usat_trunc_wb_256:
>> +; SKX: ## BB#0:
>> +; SKX-NEXT: vpmovuswb %ymm0, %xmm0
>> +; SKX-NEXT: retq
>> + %x3 = icmp ult <16 x i16> %i, <i16 255, i16 255, i16 255, i16 255, i16
>> 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255,
>> i16 255, i16 255, i16 255>
>> + %x5 = select <16 x i1> %x3, <16 x i16> %i, <16 x i16> <i16 255, i16
>> 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255, i16 255,
>> i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
>> + %x6 = trunc <16 x i16> %x5 to <16 x i8>
>> + ret <16 x i8> %x6
>> +}
>> +
>> +define void @usat_trunc_wb_128_mem(<8 x i16> %i, <8 x i8>* %res) {
>> +; KNL-LABEL: usat_trunc_wb_128_mem:
>> +; KNL: ## BB#0:
>> +; KNL-NEXT: vpminuw {{.*}}(%rip), %xmm0, %xmm0
>> +; KNL-NEXT: vpshufb {{.*#+}} xmm0 =
>> xmm0[0,2,4,6,8,10,12,14,u,u,u,u,u,u,u,u]
>> +; KNL-NEXT: vmovq %xmm0, (%rdi)
>> +; KNL-NEXT: retq
>> +;
>> +; SKX-LABEL: usat_trunc_wb_128_mem:
>> +; SKX: ## BB#0:
>> +; SKX-NEXT: vpmovuswb %xmm0, (%rdi)
>> +; SKX-NEXT: retq
>> + %x3 = icmp ult <8 x i16> %i, <i16 255, i16 255, i16 255, i16 255, i16
>> 255, i16 255, i16 255, i16 255>
>> + %x5 = select <8 x i1> %x3, <8 x i16> %i, <8 x i16> <i16 255, i16 255,
>> i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
>> + %x6 = trunc <8 x i16> %x5 to <8 x i8>
>> + store <8 x i8> %x6, <8 x i8>* %res, align 1
>> + ret void
>> +}
>> +
>> +define void @usat_trunc_db_512_mem(<16 x i32> %i, <16 x i8>* %res) {
>> +; ALL-LABEL: usat_trunc_db_512_mem:
>> +; ALL: ## BB#0:
>> +; ALL-NEXT: vpmovusdb %zmm0, (%rdi)
>> +; ALL-NEXT: retq
>> + %x3 = icmp ult <16 x i32> %i, <i32 255, i32 255, i32 255, i32 255, i32
>> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>> i32 255, i32 255, i32 255>
>> + %x5 = select <16 x i1> %x3, <16 x i32> %i, <16 x i32> <i32 255, i32
>> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>> i32 255, i32 255, i32 255, i32 255, i32 255, i32 255>
>> + %x6 = trunc <16 x i32> %x5 to <16 x i8>
>> + store <16 x i8> %x6, <16 x i8>* %res, align 1
>> + ret void
>> +}
>> +
>> +define void @usat_trunc_qb_512_mem(<8 x i64> %i, <8 x i8>* %res) {
>> +; ALL-LABEL: usat_trunc_qb_512_mem:
>> +; ALL: ## BB#0:
>> +; ALL-NEXT: vpmovusqb %zmm0, (%rdi)
>> +; ALL-NEXT: retq
>> + %x3 = icmp ult <8 x i64> %i, <i64 255, i64 255, i64 255, i64 255, i64
>> 255, i64 255, i64 255, i64 255>
>> + %x5 = select <8 x i1> %x3, <8 x i64> %i, <8 x i64> <i64 255, i64 255,
>> i64 255, i64 255, i64 255, i64 255, i64 255, i64 255>
>> + %x6 = trunc <8 x i64> %x5 to <8 x i8>
>> + store <8 x i8> %x6, <8 x i8>* %res, align 1
>> + ret void
>> +}
>> +
>> +define void @usat_trunc_qd_512_mem(<8 x i64> %i, <8 x i32>* %res) {
>> +; ALL-LABEL: usat_trunc_qd_512_mem:
>> +; ALL: ## BB#0:
>> +; ALL-NEXT: vpmovusqd %zmm0, (%rdi)
>> +; ALL-NEXT: retq
>> + %x3 = icmp ult <8 x i64> %i, <i64 4294967295, i64 4294967295, i64
>> 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295,
>> i64 4294967295>
>> + %x5 = select <8 x i1> %x3, <8 x i64> %i, <8 x i64> <i64 4294967295, i64
>> 4294967295, i64 4294967295, i64 4294967295, i64 4294967295, i64 4294967295,
>> i64 4294967295, i64 4294967295>
>> + %x6 = trunc <8 x i64> %x5 to <8 x i32>
>> + store <8 x i32> %x6, <8 x i32>* %res, align 1
>> + ret void
>> +}
>> +
>> +define void @usat_trunc_qw_512_mem(<8 x i64> %i, <8 x i16>* %res) {
>> +; ALL-LABEL: usat_trunc_qw_512_mem:
>> +; ALL: ## BB#0:
>> +; ALL-NEXT: vpmovusqw %zmm0, (%rdi)
>> +; ALL-NEXT: retq
>> + %x3 = icmp ult <8 x i64> %i, <i64 65535, i64 65535, i64 65535, i64
>> 65535, i64 65535, i64 65535, i64 65535, i64 65535>
>> + %x5 = select <8 x i1> %x3, <8 x i64> %i, <8 x i64> <i64 65535, i64
>> 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535>
>> + %x6 = trunc <8 x i64> %x5 to <8 x i16>
>> + store <8 x i16> %x6, <8 x i16>* %res, align 1
>> + ret void
>> +}
>> +
>> +define <32 x i8> @usat_trunc_db_1024(<32 x i32> %i) {
>> +; KNL-LABEL: usat_trunc_db_1024:
>> +; KNL: ## BB#0:
>> +; KNL-NEXT: vpmovusdb %zmm0, %xmm0
>> +; KNL-NEXT: vpmovusdb %zmm1, %xmm1
>> +; KNL-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0
>> +; KNL-NEXT: retq
>> +;
>> +; SKX-LABEL: usat_trunc_db_1024:
>> +; SKX: ## BB#0:
>> +; SKX-NEXT: vpbroadcastd {{.*}}(%rip), %zmm2
>> +; SKX-NEXT: vpminud %zmm2, %zmm1, %zmm1
>> +; SKX-NEXT: vpminud %zmm2, %zmm0, %zmm0
>> +; SKX-NEXT: vpmovdw %zmm0, %ymm0
>> +; SKX-NEXT: vpmovdw %zmm1, %ymm1
>> +; SKX-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
>> +; SKX-NEXT: vpmovwb %zmm0, %ymm0
>> +; SKX-NEXT: retq
>> + %x3 = icmp ult <32 x i32> %i, <i32 255, i32 255, i32 255, i32 255, i32
>> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>> i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
>> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>> i32 255, i32 255>
>> + %x5 = select <32 x i1> %x3, <32 x i32> %i, <32 x i32> <i32 255, i32
>> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>> i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
>> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>> i32 255, i32 255, i32 255, i32 255, i32 255>
>> + %x6 = trunc <32 x i32> %x5 to <32 x i8>
>> + ret <32 x i8> %x6
>> +}
>> +
>> +define void @usat_trunc_db_1024_mem(<32 x i32> %i, <32 x i8>* %p) {
>> +; KNL-LABEL: usat_trunc_db_1024_mem:
>> +; KNL: ## BB#0:
>> +; KNL-NEXT: vpmovusdb %zmm0, %xmm0
>> +; KNL-NEXT: vpmovusdb %zmm1, %xmm1
>> +; KNL-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0
>> +; KNL-NEXT: vmovdqu %ymm0, (%rdi)
>> +; KNL-NEXT: retq
>> +;
>> +; SKX-LABEL: usat_trunc_db_1024_mem:
>> +; SKX: ## BB#0:
>> +; SKX-NEXT: vpbroadcastd {{.*}}(%rip), %zmm2
>> +; SKX-NEXT: vpminud %zmm2, %zmm1, %zmm1
>> +; SKX-NEXT: vpminud %zmm2, %zmm0, %zmm0
>> +; SKX-NEXT: vpmovdw %zmm0, %ymm0
>> +; SKX-NEXT: vpmovdw %zmm1, %ymm1
>> +; SKX-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
>> +; SKX-NEXT: vpmovwb %zmm0, (%rdi)
>> +; SKX-NEXT: retq
>> + %x3 = icmp ult <32 x i32> %i, <i32 255, i32 255, i32 255, i32 255, i32
>> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>> i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
>> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>> i32 255, i32 255>
>> + %x5 = select <32 x i1> %x3, <32 x i32> %i, <32 x i32> <i32 255, i32
>> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>> i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32
>> 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255, i32 255,
>> i32 255, i32 255, i32 255, i32 255, i32 255>
>> + %x6 = trunc <32 x i32> %x5 to <32 x i8>
>> + store <32 x i8>%x6, <32 x i8>* %p, align 1
>> + ret void
>> +}
>> +
>> +define <16 x i16> @usat_trunc_dw_512(<16 x i32> %i) {
>> +; ALL-LABEL: usat_trunc_dw_512:
>> +; ALL: ## BB#0:
>> +; ALL-NEXT: vpmovusdw %zmm0, %ymm0
>> +; ALL-NEXT: retq
>> + %x3 = icmp ult <16 x i32> %i, <i32 65535, i32 65535, i32 65535, i32
>> 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32
>> 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
>> + %x5 = select <16 x i1> %x3, <16 x i32> %i, <16 x i32> <i32 65535, i32
>> 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32
>> 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32
>> 65535>
>> + %x6 = trunc <16 x i32> %x5 to <16 x i16>
>> + ret <16 x i16> %x6
>> +}
>> +
>> +define <8 x i8> @usat_trunc_wb_128(<8 x i16> %i) {
>> +; ALL-LABEL: usat_trunc_wb_128:
>> +; ALL: ## BB#0:
>> +; ALL-NEXT: vpminuw {{.*}}(%rip), %xmm0, %xmm0
>> +; ALL-NEXT: retq
>> + %x3 = icmp ult <8 x i16> %i, <i16 255, i16 255, i16 255, i16 255, i16
>> 255, i16 255, i16 255, i16 255>
>> + %x5 = select <8 x i1> %x3, <8 x i16> %i, <8 x i16> <i16 255, i16 255,
>> i16 255, i16 255, i16 255, i16 255, i16 255, i16 255>
>> + %x6 = trunc <8 x i16> %x5 to <8 x i8>
>> + ret <8 x i8>%x6
>> +}
>> +
>> +define <16 x i16> @usat_trunc_qw_1024(<16 x i64> %i) {
>> +; KNL-LABEL: usat_trunc_qw_1024:
>> +; KNL: ## BB#0:
>> +; KNL-NEXT: vpbroadcastq {{.*}}(%rip), %zmm2
>> +; KNL-NEXT: vpminuq %zmm2, %zmm1, %zmm1
>> +; KNL-NEXT: vpminuq %zmm2, %zmm0, %zmm0
>> +; KNL-NEXT: vpmovqd %zmm0, %ymm0
>> +; KNL-NEXT: vpmovqd %zmm1, %ymm1
>> +; KNL-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
>> +; KNL-NEXT: vpmovdw %zmm0, %ymm0
>> +; KNL-NEXT: retq
>> +;
>> +; SKX-LABEL: usat_trunc_qw_1024:
>> +; SKX: ## BB#0:
>> +; SKX-NEXT: vpbroadcastq {{.*}}(%rip), %zmm2
>> +; SKX-NEXT: vpminuq %zmm2, %zmm1, %zmm1
>> +; SKX-NEXT: vpminuq %zmm2, %zmm0, %zmm0
>> +; SKX-NEXT: vpmovqd %zmm0, %ymm0
>> +; SKX-NEXT: vpmovqd %zmm1, %ymm1
>> +; SKX-NEXT: vinserti32x8 $1, %ymm1, %zmm0, %zmm0
>> +; SKX-NEXT: vpmovdw %zmm0, %ymm0
>> +; SKX-NEXT: retq
>> + %x3 = icmp ult <16 x i64> %i, <i64 65535, i64 65535, i64 65535, i64
>> 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64
>> 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535>
>> + %x5 = select <16 x i1> %x3, <16 x i64> %i, <16 x i64> <i64 65535, i64
>> 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64
>> 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64 65535, i64
>> 65535>
>> + %x6 = trunc <16 x i64> %x5 to <16 x i16>
>> + ret <16 x i16> %x6
>> +}
>> +
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>>
>>
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>
>
More information about the llvm-commits
mailing list