<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p>Michael, you clearly did the right thing here. Reverting a patch
which is broken is absolutely appropriate and expected.</p>
<p>Elana, if you have an internal failure that you can reduce down
to a reproducer for an upstream commit, please do revert the
change, file a bug, and reply to the commit thread with a link to
that pr.<br>
</p>
<p>Philip<br>
</p>
<br>
<div class="moz-cite-prefix">On 01/19/2017 10:47 AM, Michael
Kuperstein via llvm-commits wrote:<br>
</div>
<blockquote
cite="mid:CAL_y90nc1wY=Gz6LmDGhnfhXWpSjas5uqvz_a28Lg4-LBc8jZQ@mail.gmail.com"
type="cite">
<div dir="ltr">Hi Elena,
<div><br>
</div>
<div>Thanks for the fix.</div>
<div><br>
</div>
<div>Regarding the revert - in this case, we're talking about:</div>
<div><br>
</div>
<div>1) A recent commit,</div>
<div>2) that has nothing else layered on top of it (except for
whitespace changes)</div>
<div>3) is a performance improvement that causes a correctness
regression,</div>
<div>4) the crasher is reduced from real code, not a synthetic
test-case,</div>
<div>5) and has a small IR reproducer.</div>
<div><br>
</div>
<div>I really think that in such cases it's worth keeping trunk
clean, at the cost of the original commiter having to
reverse-merge the revert before fixing the bug.</div>
<div><br>
</div>
<div>Thanks,</div>
<div> Michael</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Jan 19, 2017 at 4:49 AM,
Demikhovsky, Elena <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:elena.demikhovsky@intel.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:elena.demikhovsky@intel.com">elena.demikhovsky@intel.com</a></a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div link="blue" vlink="purple" lang="EN-US">
<div class="m_7380666333963123496WordSection1">
<p class="MsoNormal"><a moz-do-not-send="true"
name="m_7380666333963123496__MailEndCompose"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">Fixed
and recommitted in r292479.</span></a></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d"> </span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">I’d
prefer that you’ll not revert the failing commit,
but wait for a few days. It will be easier for me to
fix.</span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">(If
it is not a buildbot failure, of course. But these
failures I can see myself)</span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">We
also find regressions in our internal testing from
time to time, PR31671, for example. We submit a PR,
notify the owner, and let him to fix the bug.</span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d"> </span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">Thanks.</span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d"> </span></p>
<p class="MsoNormal" style="margin-left:36.0pt">
<span
style="font-family:"Calibri",sans-serif;color:#2f5496"><span>-<span
style="font:7.0pt "Times New Roman"">
</span></span></span><span dir="LTR"></span><b><i><span
style="color:#2f5496"> Elena</span></i></b></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d"> </span></p>
<p class="MsoNormal" style="margin-left:36.0pt"><a
moz-do-not-send="true"
name="m_7380666333963123496______replyseparator"></a><b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">
Michael Kuperstein [mailto:<a moz-do-not-send="true"
href="mailto:mkuper@google.com" target="_blank">mkuper@google.com</a>]
<br>
<b>Sent:</b> Thursday, January 19, 2017 01:19<br>
<b>To:</b> Demikhovsky, Elena <<a
moz-do-not-send="true"
href="mailto:elena.demikhovsky@intel.com"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:elena.demikhovsky@intel.com">elena.demikhovsky@intel.com</a></a>><br>
<b>Cc:</b> llvm-commits <<a
moz-do-not-send="true"
href="mailto:llvm-commits@lists.llvm.org"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a></a>><br>
<b>Subject:</b> Re: [llvm] r291670 - X86 CodeGen:
Optimized pattern for truncate with unsigned
saturation.</span></p>
<div>
<div class="h5">
<p class="MsoNormal" style="margin-left:36.0pt"> </p>
<div>
<p class="MsoNormal" style="margin-left:82.2pt">Hi
Elena,</p>
<div>
<p class="MsoNormal" style="margin-left:82.2pt"> </p>
</div>
<div>
<p class="MsoNormal" style="margin-left:82.2pt">This
still crashes in more complex cases. I've
reverted in r292444, see PR31589 for the
reproducer.</p>
<div>
<div>
<div>
<p class="MsoNormal"
style="margin-left:82.2pt"> </p>
</div>
<div>
<p class="MsoNormal"
style="margin-left:82.2pt">Thanks,</p>
</div>
<div>
<p class="MsoNormal"
style="margin-left:82.2pt"> Michael</p>
</div>
</div>
</div>
<div>
<p class="MsoNormal"
style="margin-left:82.2pt"> </p>
<div>
<p class="MsoNormal"
style="margin-left:82.2pt">On Wed, Jan 11,
2017 at 4:59 AM, Elena Demikhovsky via
llvm-commits <<a moz-do-not-send="true"
href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>>
wrote:</p>
<blockquote
style="border:none;border-left:solid
#cccccc 1.0pt;padding:0cm 0cm 0cm
6.0pt;margin-left:4.8pt;margin-right:0cm">
<p class="MsoNormal"
style="margin-left:82.2pt">Author:
delena<br>
Date: Wed Jan 11 06:59:32 2017<br>
New Revision: 291670<br>
<br>
URL: <a moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project?rev=291670&view=rev"
target="_blank">
http://llvm.org/viewvc/llvm-<wbr>project?rev=291670&view=rev</a><br>
Log:<br>
X86 CodeGen: Optimized pattern for
truncate with unsigned saturation.<br>
<br>
DAG patterns optimization: truncate +
unsigned saturation supported by
VPMOVUS* instructions in AVX-512.<br>
And VPACKUS* instructions on SEE*
targets.<br>
<br>
Differential Revision: <a
moz-do-not-send="true"
href="https://reviews.llvm.org/D28216"
target="_blank">
<a class="moz-txt-link-freetext" href="https://reviews.llvm.org/">https://reviews.llvm.org/</a><wbr>D28216</a><br>
<br>
<br>
Modified:<br>
llvm/trunk/lib/Target/X86/<wbr>X86ISelLowering.cpp<br>
llvm/trunk/test/CodeGen/X86/<wbr>avx-trunc.ll<br>
llvm/trunk/test/CodeGen/X86/<wbr>avx512-trunc.ll<br>
<br>
Modified: llvm/trunk/lib/Target/X86/<wbr>X86ISelLowering.cpp<br>
URL: <a moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=291670&r1=291669&r2=291670&view=diff"
target="_blank">
http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/lib/Target/<wbr>X86/X86ISelLowering.cpp?rev=<wbr>291670&r1=291669&r2=291670&<wbr>view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/lib/Target/X86/<wbr>X86ISelLowering.cpp
(original)<br>
+++ llvm/trunk/lib/Target/X86/<wbr>X86ISelLowering.cpp
Wed Jan 11 06:59:32 2017<br>
@@ -31220,6 +31220,93 @@ static SDValue
foldVectorXorShiftIntoCmp<br>
return DAG.getNode(X86ISD::PCMPGT,
SDLoc(N), VT, Shift.getOperand(0),
Ones);<br>
}<br>
<br>
+/// Check if truncation with saturation
form type \p SrcVT to \p DstVT<br>
+/// is valid for the given \p
Subtarget.<br>
+static bool
isSATValidOnAVX512Subtarget(<wbr>EVT
SrcVT, EVT DstVT,<br>
+
const X86Subtarget &Subtarget) {<br>
+ if (!Subtarget.hasAVX512())<br>
+ return false;<br>
+<br>
+ // FIXME: Scalar type may be
supported if we move it to vector
register.<br>
+ if (!SrcVT.isVector() ||
!SrcVT.isSimple() ||
SrcVT.getSizeInBits() > 512)<br>
+ return false;<br>
+<br>
+ EVT SrcElVT = SrcVT.getScalarType();<br>
+ EVT DstElVT = DstVT.getScalarType();<br>
+ if (SrcElVT.getSizeInBits() < 16
|| SrcElVT.getSizeInBits() > 64)<br>
+ return false;<br>
+ if (DstElVT.getSizeInBits() < 8 ||
DstElVT.getSizeInBits() > 32)<br>
+ return false;<br>
+ if (SrcVT.is512BitVector() ||
Subtarget.hasVLX())<br>
+ return SrcElVT.getSizeInBits()
>= 32 || Subtarget.hasBWI();<br>
+ return false;<br>
+}<br>
+<br>
+/// Return true if VPACK* instruction
can be used for the given types<br>
+/// and it is avalable on \p Subtarget.<br>
+static bool<br>
+isSATValidOnSSESubtarget(EVT SrcVT, EVT
DstVT, const X86Subtarget
&Subtarget) {<br>
+ if (Subtarget.hasSSE2())<br>
+ // v16i16 -> v16i8<br>
+ if (SrcVT == MVT::v16i16 &&
DstVT == MVT::v16i8)<br>
+ return true;<br>
+ if (Subtarget.hasSSE41())<br>
+ // v8i32 -> v8i16<br>
+ if (SrcVT == MVT::v8i32 &&
DstVT == MVT::v8i16)<br>
+ return true;<br>
+ return false;<br>
+}<br>
+<br>
+/// Detect a pattern of truncation with
saturation:<br>
+/// (truncate (umin (x,
unsigned_max_of_dest_type)) to
dest_type).<br>
+/// Return the source value to be
truncated or SDValue() if the pattern
was not<br>
+/// matched.<br>
+static SDValue
detectUSatPattern(SDValue In, EVT VT) {<br>
+ if (In.getOpcode() != ISD::UMIN)<br>
+ return SDValue();<br>
+<br>
+ //Saturation with truncation. We
truncate from InVT to VT.<br>
+ assert(In.<wbr>getScalarValueSizeInBits()
> VT.getScalarSizeInBits() &&<br>
+ "Unexpected types for truncate
operation");<br>
+<br>
+ APInt C;<br>
+ if (ISD::isConstantSplatVector(<wbr>In.getOperand(1).getNode(),
C)) {<br>
+ // C should be equal to UINT32_MAX
/ UINT16_MAX / UINT8_MAX according<br>
+ // the element size of the
destination type.<br>
+ return APIntOps::isMask(VT.<wbr>getScalarSizeInBits(),
C) ? In.getOperand(0) :<br>
+ SDValue();<br>
+ }<br>
+ return SDValue();<br>
+}<br>
+<br>
+/// Detect a pattern of truncation with
saturation:<br>
+/// (truncate (umin (x,
unsigned_max_of_dest_type)) to
dest_type).<br>
+/// The types should allow to use
VPMOVUS* instruction on AVX512.<br>
+/// Return the source value to be
truncated or SDValue() if the pattern
was not<br>
+/// matched.<br>
+static SDValue detectAVX512USatPattern(<wbr>SDValue
In, EVT VT,<br>
+
const X86Subtarget &Subtarget) {<br>
+ if (!isSATValidOnAVX512Subtarget(<wbr>In.getValueType(),
VT, Subtarget))<br>
+ return SDValue();<br>
+ return detectUSatPattern(In, VT);<br>
+}<br>
+<br>
+static SDValue<br>
+combineTruncateWithUSat(<wbr>SDValue
In, EVT VT, SDLoc &DL, SelectionDAG
&DAG,<br>
+ const
X86Subtarget &Subtarget) {<br>
+ SDValue USatVal =
detectUSatPattern(In, VT);<br>
+ if (USatVal) {<br>
+ if (isSATValidOnAVX512Subtarget(<wbr>In.getValueType(),
VT, Subtarget))<br>
+ return
DAG.getNode(X86ISD::VTRUNCUS, DL, VT,
USatVal);<br>
+ if (isSATValidOnSSESubtarget(In.<wbr>getValueType(),
VT, Subtarget)) {<br>
+ SDValue Lo, Hi;<br>
+ std::tie(Lo, Hi) =
DAG.SplitVector(USatVal, DL);<br>
+ return
DAG.getNode(X86ISD::PACKUS, DL, VT, Lo,
Hi);<br>
+ }<br>
+ }<br>
+ return SDValue();<br>
+}<br>
+<br>
/// This function detects the AVG
pattern between vectors of unsigned
i8/i16,<br>
/// which is c = (a + b + 1) / 2, and
replace this operation with the
efficient<br>
/// X86ISD::AVG instruction.<br>
@@ -31786,6 +31873,12 @@ static SDValue
combineStore(SDNode *N, S<br>
St->getPointerInfo(),
St->getAlignment(),<br>
St->getMemOperand()-><wbr>getFlags());<br>
<br>
+ if (SDValue Val =<br>
+ detectAVX512USatPattern(St-><wbr>getValue(),
St->getMemoryVT(), Subtarget))<br>
+ return EmitTruncSStore(false /*
Unsigned saturation */,
St->getChain(),<br>
+ dl, Val,
St->getBasePtr(),<br>
+
St->getMemoryVT(),
St->getMemOperand(), DAG);<br>
+<br>
const TargetLowering &TLI =
DAG.getTargetLoweringInfo();<br>
unsigned NumElems =
VT.getVectorNumElements();<br>
assert(StVT != VT &&
"Cannot truncate to the same type");<br>
@@ -32406,6 +32499,10 @@ static SDValue
combineTruncate(SDNode *N<br>
if (SDValue Avg =
detectAVGPattern(Src, VT, DAG,
Subtarget, DL))<br>
return Avg;<br>
<br>
+ // Try to combine truncation with
unsigned saturation.<br>
+ if (SDValue Val =
combineTruncateWithUSat(Src, VT, DL,
DAG, Subtarget))<br>
+ return Val;<br>
+<br>
// The bitcast source is a direct mmx
result.<br>
// Detect bitcasts between i32 to
x86mmx<br>
if (Src.getOpcode() == ISD::BITCAST
&& VT == MVT::i32) {<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/<wbr>avx-trunc.ll<br>
URL: <a moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-trunc.ll?rev=291670&r1=291669&r2=291670&view=diff"
target="_blank">
http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/test/<wbr>CodeGen/X86/avx-trunc.ll?rev=<wbr>291670&r1=291669&r2=291670&<wbr>view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/X86/<wbr>avx-trunc.ll
(original)<br>
+++ llvm/trunk/test/CodeGen/X86/<wbr>avx-trunc.ll
Wed Jan 11 06:59:32 2017<br>
@@ -39,3 +39,29 @@ define <16 x
i8> @trunc_16_8(<16 x i16><br>
%B = trunc <16 x i16> %A to
<16 x i8><br>
ret <16 x i8> %B<br>
}<br>
+<br>
+define <16 x i8>
@usat_trunc_wb_256(<16 x i16> %i)
{<br>
+; CHECK-LABEL: usat_trunc_wb_256:<br>
+; CHECK: # BB#0:<br>
+; CHECK-NEXT: vextractf128 $1,
%ymm0, %xmm1<br>
+; CHECK-NEXT: vpackuswb %xmm1,
%xmm0, %xmm0<br>
+; CHECK-NEXT: vzeroupper<br>
+; CHECK-NEXT: retq<br>
+ %x3 = icmp ult <16 x i16> %i,
<i16 255, i16 255, i16 255, i16 255,
i16 255, i16 255, i16 255, i16 255, i16
255, i16 255, i16 255, i16 255, i16 255,
i16 255, i16 255, i16 255><br>
+ %x5 = select <16 x i1> %x3,
<16 x i16> %i, <16 x i16>
<i16 255, i16 255, i16 255, i16 255,
i16 255, i16 255, i16 255, i16 255, i16
255, i16 255, i16 255, i16 255, i16 255,
i16 255, i16 255, i16 255><br>
+ %x6 = trunc <16 x i16> %x5 to
<16 x i8><br>
+ ret <16 x i8> %x6<br>
+}<br>
+<br>
+define <8 x i16>
@usat_trunc_dw_256(<8 x i32> %i) {<br>
+; CHECK-LABEL: usat_trunc_dw_256:<br>
+; CHECK: # BB#0:<br>
+; CHECK-NEXT: vextractf128 $1,
%ymm0, %xmm1<br>
+; CHECK-NEXT: vpackusdw %xmm1,
%xmm0, %xmm0<br>
+; CHECK-NEXT: vzeroupper<br>
+; CHECK-NEXT: retq<br>
+ %x3 = icmp ult <8 x i32> %i,
<i32 65535, i32 65535, i32 65535, i32
65535, i32 65535, i32 65535, i32 65535,
i32 65535><br>
+ %x5 = select <8 x i1> %x3,
<8 x i32> %i, <8 x i32>
<i32 65535, i32 65535, i32 65535, i32
65535, i32 65535, i32 65535, i32 65535,
i32 65535><br>
+ %x6 = trunc <8 x i32> %x5 to
<8 x i16><br>
+ ret <8 x i16> %x6<br>
+}<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/<wbr>avx512-trunc.ll<br>
URL: <a moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx512-trunc.ll?rev=291670&r1=291669&r2=291670&view=diff"
target="_blank">
http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/test/<wbr>CodeGen/X86/avx512-trunc.ll?<wbr>rev=291670&r1=291669&r2=<wbr>291670&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/X86/<wbr>avx512-trunc.ll
(original)<br>
+++ llvm/trunk/test/CodeGen/X86/<wbr>avx512-trunc.ll
Wed Jan 11 06:59:32 2017<br>
@@ -500,3 +500,208 @@ define void
@trunc_wb_128_mem(<8 x i16><br>
store <8 x i8> %x, <8 x
i8>* %res<br>
ret void<br>
}<br>
+<br>
+<br>
+define void
@usat_trunc_wb_256_mem(<16 x i16>
%i, <16 x i8>* %res) {<br>
+; KNL-LABEL: usat_trunc_wb_256_mem:<br>
+; KNL: ## BB#0:<br>
+; KNL-NEXT: vextracti128 $1, %ymm0,
%xmm1<br>
+; KNL-NEXT: vpackuswb %xmm1, %xmm0,
%xmm0<br>
+; KNL-NEXT: vmovdqu %xmm0, (%rdi)<br>
+; KNL-NEXT: retq<br>
+;<br>
+; SKX-LABEL: usat_trunc_wb_256_mem:<br>
+; SKX: ## BB#0:<br>
+; SKX-NEXT: vpmovuswb %ymm0, (%rdi)<br>
+; SKX-NEXT: retq<br>
+ %x3 = icmp ult <16 x i16> %i,
<i16 255, i16 255, i16 255, i16 255,
i16 255, i16 255, i16 255, i16 255, i16
255, i16 255, i16 255, i16 255, i16 255,
i16 255, i16 255, i16 255><br>
+ %x5 = select <16 x i1> %x3,
<16 x i16> %i, <16 x i16>
<i16 255, i16 255, i16 255, i16 255,
i16 255, i16 255, i16 255, i16 255, i16
255, i16 255, i16 255, i16 255, i16 255,
i16 255, i16 255, i16 255><br>
+ %x6 = trunc <16 x i16> %x5 to
<16 x i8><br>
+ store <16 x i8> %x6, <16 x
i8>* %res, align 1<br>
+ ret void<br>
+}<br>
+<br>
+define <16 x i8>
@usat_trunc_wb_256(<16 x i16> %i)
{<br>
+; KNL-LABEL: usat_trunc_wb_256:<br>
+; KNL: ## BB#0:<br>
+; KNL-NEXT: vextracti128 $1, %ymm0,
%xmm1<br>
+; KNL-NEXT: vpackuswb %xmm1, %xmm0,
%xmm0<br>
+; KNL-NEXT: retq<br>
+;<br>
+; SKX-LABEL: usat_trunc_wb_256:<br>
+; SKX: ## BB#0:<br>
+; SKX-NEXT: vpmovuswb %ymm0, %xmm0<br>
+; SKX-NEXT: retq<br>
+ %x3 = icmp ult <16 x i16> %i,
<i16 255, i16 255, i16 255, i16 255,
i16 255, i16 255, i16 255, i16 255, i16
255, i16 255, i16 255, i16 255, i16 255,
i16 255, i16 255, i16 255><br>
+ %x5 = select <16 x i1> %x3,
<16 x i16> %i, <16 x i16>
<i16 255, i16 255, i16 255, i16 255,
i16 255, i16 255, i16 255, i16 255, i16
255, i16 255, i16 255, i16 255, i16 255,
i16 255, i16 255, i16 255><br>
+ %x6 = trunc <16 x i16> %x5 to
<16 x i8><br>
+ ret <16 x i8> %x6<br>
+}<br>
+<br>
+define void
@usat_trunc_wb_128_mem(<8 x i16>
%i, <8 x i8>* %res) {<br>
+; KNL-LABEL: usat_trunc_wb_128_mem:<br>
+; KNL: ## BB#0:<br>
+; KNL-NEXT: vpminuw {{.*}}(%rip),
%xmm0, %xmm0<br>
+; KNL-NEXT: vpshufb {{.*#+}} xmm0 =
xmm0[0,2,4,6,8,10,12,14,u,u,u,<wbr>u,u,u,u,u]<br>
+; KNL-NEXT: vmovq %xmm0, (%rdi)<br>
+; KNL-NEXT: retq<br>
+;<br>
+; SKX-LABEL: usat_trunc_wb_128_mem:<br>
+; SKX: ## BB#0:<br>
+; SKX-NEXT: vpmovuswb %xmm0, (%rdi)<br>
+; SKX-NEXT: retq<br>
+ %x3 = icmp ult <8 x i16> %i,
<i16 255, i16 255, i16 255, i16 255,
i16 255, i16 255, i16 255, i16 255><br>
+ %x5 = select <8 x i1> %x3,
<8 x i16> %i, <8 x i16>
<i16 255, i16 255, i16 255, i16 255,
i16 255, i16 255, i16 255, i16 255><br>
+ %x6 = trunc <8 x i16> %x5 to
<8 x i8><br>
+ store <8 x i8> %x6, <8 x
i8>* %res, align 1<br>
+ ret void<br>
+}<br>
+<br>
+define void
@usat_trunc_db_512_mem(<16 x i32>
%i, <16 x i8>* %res) {<br>
+; ALL-LABEL: usat_trunc_db_512_mem:<br>
+; ALL: ## BB#0:<br>
+; ALL-NEXT: vpmovusdb %zmm0, (%rdi)<br>
+; ALL-NEXT: retq<br>
+ %x3 = icmp ult <16 x i32> %i,
<i32 255, i32 255, i32 255, i32 255,
i32 255, i32 255, i32 255, i32 255, i32
255, i32 255, i32 255, i32 255, i32 255,
i32 255, i32 255, i32 255><br>
+ %x5 = select <16 x i1> %x3,
<16 x i32> %i, <16 x i32>
<i32 255, i32 255, i32 255, i32 255,
i32 255, i32 255, i32 255, i32 255, i32
255, i32 255, i32 255, i32 255, i32 255,
i32 255, i32 255, i32 255><br>
+ %x6 = trunc <16 x i32> %x5 to
<16 x i8><br>
+ store <16 x i8> %x6, <16 x
i8>* %res, align 1<br>
+ ret void<br>
+}<br>
+<br>
+define void
@usat_trunc_qb_512_mem(<8 x i64>
%i, <8 x i8>* %res) {<br>
+; ALL-LABEL: usat_trunc_qb_512_mem:<br>
+; ALL: ## BB#0:<br>
+; ALL-NEXT: vpmovusqb %zmm0, (%rdi)<br>
+; ALL-NEXT: retq<br>
+ %x3 = icmp ult <8 x i64> %i,
<i64 255, i64 255, i64 255, i64 255,
i64 255, i64 255, i64 255, i64 255><br>
+ %x5 = select <8 x i1> %x3,
<8 x i64> %i, <8 x i64>
<i64 255, i64 255, i64 255, i64 255,
i64 255, i64 255, i64 255, i64 255><br>
+ %x6 = trunc <8 x i64> %x5 to
<8 x i8><br>
+ store <8 x i8> %x6, <8 x
i8>* %res, align 1<br>
+ ret void<br>
+}<br>
+<br>
+define void
@usat_trunc_qd_512_mem(<8 x i64>
%i, <8 x i32>* %res) {<br>
+; ALL-LABEL: usat_trunc_qd_512_mem:<br>
+; ALL: ## BB#0:<br>
+; ALL-NEXT: vpmovusqd %zmm0, (%rdi)<br>
+; ALL-NEXT: retq<br>
+ %x3 = icmp ult <8 x i64> %i,
<i64 4294967295, i64 4294967295, i64
4294967295, i64 4294967295, i64
4294967295, i64 4294967295, i64
4294967295, i64 4294967295><br>
+ %x5 = select <8 x i1> %x3,
<8 x i64> %i, <8 x i64>
<i64 4294967295, i64 4294967295, i64
4294967295, i64 4294967295, i64
4294967295, i64 4294967295, i64
4294967295, i64 4294967295><br>
+ %x6 = trunc <8 x i64> %x5 to
<8 x i32><br>
+ store <8 x i32> %x6, <8 x
i32>* %res, align 1<br>
+ ret void<br>
+}<br>
+<br>
+define void
@usat_trunc_qw_512_mem(<8 x i64>
%i, <8 x i16>* %res) {<br>
+; ALL-LABEL: usat_trunc_qw_512_mem:<br>
+; ALL: ## BB#0:<br>
+; ALL-NEXT: vpmovusqw %zmm0, (%rdi)<br>
+; ALL-NEXT: retq<br>
+ %x3 = icmp ult <8 x i64> %i,
<i64 65535, i64 65535, i64 65535, i64
65535, i64 65535, i64 65535, i64 65535,
i64 65535><br>
+ %x5 = select <8 x i1> %x3,
<8 x i64> %i, <8 x i64>
<i64 65535, i64 65535, i64 65535, i64
65535, i64 65535, i64 65535, i64 65535,
i64 65535><br>
+ %x6 = trunc <8 x i64> %x5 to
<8 x i16><br>
+ store <8 x i16> %x6, <8 x
i16>* %res, align 1<br>
+ ret void<br>
+}<br>
+<br>
+define <32 x i8>
@usat_trunc_db_1024(<32 x i32> %i)
{<br>
+; KNL-LABEL: usat_trunc_db_1024:<br>
+; KNL: ## BB#0:<br>
+; KNL-NEXT: vpmovusdb %zmm0, %xmm0<br>
+; KNL-NEXT: vpmovusdb %zmm1, %xmm1<br>
+; KNL-NEXT: vinserti128 $1, %xmm1,
%ymm0, %ymm0<br>
+; KNL-NEXT: retq<br>
+;<br>
+; SKX-LABEL: usat_trunc_db_1024:<br>
+; SKX: ## BB#0:<br>
+; SKX-NEXT: vpbroadcastd
{{.*}}(%rip), %zmm2<br>
+; SKX-NEXT: vpminud %zmm2, %zmm1,
%zmm1<br>
+; SKX-NEXT: vpminud %zmm2, %zmm0,
%zmm0<br>
+; SKX-NEXT: vpmovdw %zmm0, %ymm0<br>
+; SKX-NEXT: vpmovdw %zmm1, %ymm1<br>
+; SKX-NEXT: vinserti64x4 $1, %ymm1,
%zmm0, %zmm0<br>
+; SKX-NEXT: vpmovwb %zmm0, %ymm0<br>
+; SKX-NEXT: retq<br>
+ %x3 = icmp ult <32 x i32> %i,
<i32 255, i32 255, i32 255, i32 255,
i32 255, i32 255, i32 255, i32 255, i32
255, i32 255, i32 255, i32 255, i32 255,
i32 255, i32 255, i32 255, i32 255, i32
255, i32 255, i32 255, i32 255, i32 255,
i32 255, i32 255, i32 255, i32 255, i32
255, i32 255, i32 255, i32 255, i32 255,
i32 255><br>
+ %x5 = select <32 x i1> %x3,
<32 x i32> %i, <32 x i32>
<i32 255, i32 255, i32 255, i32 255,
i32 255, i32 255, i32 255, i32 255, i32
255, i32 255, i32 255, i32 255, i32 255,
i32 255, i32 255, i32 255, i32 255, i32
255, i32 255, i32 255, i32 255, i32 255,
i32 255, i32 255, i32 255, i32 255, i32
255, i32 255, i32 255, i32 255, i32 255,
i32 255><br>
+ %x6 = trunc <32 x i32> %x5 to
<32 x i8><br>
+ ret <32 x i8> %x6<br>
+}<br>
+<br>
+define void
@usat_trunc_db_1024_mem(<32 x i32>
%i, <32 x i8>* %p) {<br>
+; KNL-LABEL: usat_trunc_db_1024_mem:<br>
+; KNL: ## BB#0:<br>
+; KNL-NEXT: vpmovusdb %zmm0, %xmm0<br>
+; KNL-NEXT: vpmovusdb %zmm1, %xmm1<br>
+; KNL-NEXT: vinserti128 $1, %xmm1,
%ymm0, %ymm0<br>
+; KNL-NEXT: vmovdqu %ymm0, (%rdi)<br>
+; KNL-NEXT: retq<br>
+;<br>
+; SKX-LABEL: usat_trunc_db_1024_mem:<br>
+; SKX: ## BB#0:<br>
+; SKX-NEXT: vpbroadcastd
{{.*}}(%rip), %zmm2<br>
+; SKX-NEXT: vpminud %zmm2, %zmm1,
%zmm1<br>
+; SKX-NEXT: vpminud %zmm2, %zmm0,
%zmm0<br>
+; SKX-NEXT: vpmovdw %zmm0, %ymm0<br>
+; SKX-NEXT: vpmovdw %zmm1, %ymm1<br>
+; SKX-NEXT: vinserti64x4 $1, %ymm1,
%zmm0, %zmm0<br>
+; SKX-NEXT: vpmovwb %zmm0, (%rdi)<br>
+; SKX-NEXT: retq<br>
+ %x3 = icmp ult <32 x i32> %i,
<i32 255, i32 255, i32 255, i32 255,
i32 255, i32 255, i32 255, i32 255, i32
255, i32 255, i32 255, i32 255, i32 255,
i32 255, i32 255, i32 255, i32 255, i32
255, i32 255, i32 255, i32 255, i32 255,
i32 255, i32 255, i32 255, i32 255, i32
255, i32 255, i32 255, i32 255, i32 255,
i32 255><br>
+ %x5 = select <32 x i1> %x3,
<32 x i32> %i, <32 x i32>
<i32 255, i32 255, i32 255, i32 255,
i32 255, i32 255, i32 255, i32 255, i32
255, i32 255, i32 255, i32 255, i32 255,
i32 255, i32 255, i32 255, i32 255, i32
255, i32 255, i32 255, i32 255, i32 255,
i32 255, i32 255, i32 255, i32 255, i32
255, i32 255, i32 255, i32 255, i32 255,
i32 255><br>
+ %x6 = trunc <32 x i32> %x5 to
<32 x i8><br>
+ store <32 x i8>%x6, <32 x
i8>* %p, align 1<br>
+ ret void<br>
+}<br>
+<br>
+define <16 x i16>
@usat_trunc_dw_512(<16 x i32> %i)
{<br>
+; ALL-LABEL: usat_trunc_dw_512:<br>
+; ALL: ## BB#0:<br>
+; ALL-NEXT: vpmovusdw %zmm0, %ymm0<br>
+; ALL-NEXT: retq<br>
+ %x3 = icmp ult <16 x i32> %i,
<i32 65535, i32 65535, i32 65535, i32
65535, i32 65535, i32 65535, i32 65535,
i32 65535, i32 65535, i32 65535, i32
65535, i32 65535, i32 65535, i32 65535,
i32 65535, i32 65535><br>
+ %x5 = select <16 x i1> %x3,
<16 x i32> %i, <16 x i32>
<i32 65535, i32 65535, i32 65535, i32
65535, i32 65535, i32 65535, i32 65535,
i32 65535, i32 65535, i32 65535, i32
65535, i32 65535, i32 65535, i32 65535,
i32 65535, i32 65535><br>
+ %x6 = trunc <16 x i32> %x5 to
<16 x i16><br>
+ ret <16 x i16> %x6<br>
+}<br>
+<br>
+define <8 x i8>
@usat_trunc_wb_128(<8 x i16> %i) {<br>
+; ALL-LABEL: usat_trunc_wb_128:<br>
+; ALL: ## BB#0:<br>
+; ALL-NEXT: vpminuw {{.*}}(%rip),
%xmm0, %xmm0<br>
+; ALL-NEXT: retq<br>
+ %x3 = icmp ult <8 x i16> %i,
<i16 255, i16 255, i16 255, i16 255,
i16 255, i16 255, i16 255, i16 255><br>
+ %x5 = select <8 x i1> %x3,
<8 x i16> %i, <8 x i16>
<i16 255, i16 255, i16 255, i16 255,
i16 255, i16 255, i16 255, i16 255><br>
+ %x6 = trunc <8 x i16> %x5 to
<8 x i8><br>
+ ret <8 x i8>%x6<br>
+}<br>
+<br>
+define <16 x i16>
@usat_trunc_qw_1024(<16 x i64> %i)
{<br>
+; KNL-LABEL: usat_trunc_qw_1024:<br>
+; KNL: ## BB#0:<br>
+; KNL-NEXT: vpbroadcastq
{{.*}}(%rip), %zmm2<br>
+; KNL-NEXT: vpminuq %zmm2, %zmm1,
%zmm1<br>
+; KNL-NEXT: vpminuq %zmm2, %zmm0,
%zmm0<br>
+; KNL-NEXT: vpmovqd %zmm0, %ymm0<br>
+; KNL-NEXT: vpmovqd %zmm1, %ymm1<br>
+; KNL-NEXT: vinserti64x4 $1, %ymm1,
%zmm0, %zmm0<br>
+; KNL-NEXT: vpmovdw %zmm0, %ymm0<br>
+; KNL-NEXT: retq<br>
+;<br>
+; SKX-LABEL: usat_trunc_qw_1024:<br>
+; SKX: ## BB#0:<br>
+; SKX-NEXT: vpbroadcastq
{{.*}}(%rip), %zmm2<br>
+; SKX-NEXT: vpminuq %zmm2, %zmm1,
%zmm1<br>
+; SKX-NEXT: vpminuq %zmm2, %zmm0,
%zmm0<br>
+; SKX-NEXT: vpmovqd %zmm0, %ymm0<br>
+; SKX-NEXT: vpmovqd %zmm1, %ymm1<br>
+; SKX-NEXT: vinserti32x8 $1, %ymm1,
%zmm0, %zmm0<br>
+; SKX-NEXT: vpmovdw %zmm0, %ymm0<br>
+; SKX-NEXT: retq<br>
+ %x3 = icmp ult <16 x i64> %i,
<i64 65535, i64 65535, i64 65535, i64
65535, i64 65535, i64 65535, i64 65535,
i64 65535, i64 65535, i64 65535, i64
65535, i64 65535, i64 65535, i64 65535,
i64 65535, i64 65535><br>
+ %x5 = select <16 x i1> %x3,
<16 x i64> %i, <16 x i64>
<i64 65535, i64 65535, i64 65535, i64
65535, i64 65535, i64 65535, i64 65535,
i64 65535, i64 65535, i64 65535, i64
65535, i64 65535, i64 65535, i64 65535,
i64 65535, i64 65535><br>
+ %x6 = trunc <16 x i64> %x5 to
<16 x i16><br>
+ ret <16 x i16> %x6<br>
+}<br>
+<br>
<br>
<br>
______________________________<wbr>_________________<br>
llvm-commits mailing list<br>
<a moz-do-not-send="true"
href="mailto:llvm-commits@lists.llvm.org"
target="_blank">llvm-commits@lists.llvm.org</a><br>
<a moz-do-not-send="true"
href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits"
target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-commits</a></p>
</blockquote>
</div>
<p class="MsoNormal"
style="margin-left:82.2pt"> </p>
</div>
</div>
</div>
</div>
</div>
</div>
<p>------------------------------<wbr>------------------------------<wbr>---------<br>
Intel Israel (74) Limited</p>
<p>This e-mail and any attachments may contain
confidential material for<br>
the sole use of the intended recipient(s). Any review or
distribution<br>
by others is strictly prohibited. If you are not the
intended<br>
recipient, please contact the sender and delete all
copies.</p>
</div>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
llvm-commits mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a>
</pre>
</blockquote>
<br>
</body>
</html>