<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <p>Michael, you clearly did the right thing here.  Reverting a patch
      which is broken is absolutely appropriate and expected.</p>
    <p>Elana, if you have an internal failure that you can reduce down
      to a reproducer for an upstream commit, please do revert the
      change, file a bug, and reply to the commit thread with a link to
      that pr.<br>
    </p>
    <p>Philip<br>
    </p>
    <br>
    <div class="moz-cite-prefix">On 01/19/2017 10:47 AM, Michael
      Kuperstein via llvm-commits wrote:<br>
    </div>
    <blockquote
cite="mid:CAL_y90nc1wY=Gz6LmDGhnfhXWpSjas5uqvz_a28Lg4-LBc8jZQ@mail.gmail.com"
      type="cite">
      <div dir="ltr">Hi Elena,
        <div><br>
        </div>
        <div>Thanks for the fix.</div>
        <div><br>
        </div>
        <div>Regarding the revert - in this case, we're talking about:</div>
        <div><br>
        </div>
        <div>1) A recent commit,</div>
        <div>2) that has nothing else layered on top of it (except for
          whitespace changes)</div>
        <div>3) is a performance improvement that causes a correctness
          regression,</div>
        <div>4) the crasher is reduced from real code, not a synthetic
          test-case,</div>
        <div>5) and has a small IR reproducer.</div>
        <div><br>
        </div>
        <div>I really think that in such cases it's worth keeping trunk
          clean, at the cost of the original commiter having to
          reverse-merge the revert before fixing the bug.</div>
        <div><br>
        </div>
        <div>Thanks,</div>
        <div>  Michael</div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Thu, Jan 19, 2017 at 4:49 AM,
          Demikhovsky, Elena <span dir="ltr"><<a
              moz-do-not-send="true"
              href="mailto:elena.demikhovsky@intel.com" target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:elena.demikhovsky@intel.com">elena.demikhovsky@intel.com</a></a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div link="blue" vlink="purple" lang="EN-US">
              <div class="m_7380666333963123496WordSection1">
                <p class="MsoNormal"><a moz-do-not-send="true"
                    name="m_7380666333963123496__MailEndCompose"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">Fixed
                      and recommitted in r292479.</span></a></p>
                <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d"> </span></p>
                <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">I’d
                    prefer that you’ll not revert the failing commit,
                    but wait for a few days. It will be easier for me to
                    fix.</span></p>
                <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">(If
                    it is not a buildbot failure, of course. But these
                    failures I can see myself)</span></p>
                <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">We
                    also find regressions in our internal testing from
                    time to time, PR31671, for example. We submit a PR,
                    notify the owner, and let him to fix the bug.</span></p>
                <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d"> </span></p>
                <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">Thanks.</span></p>
                <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d"> </span></p>
                <p class="MsoNormal" style="margin-left:36.0pt">
                  <span
                    style="font-family:"Calibri",sans-serif;color:#2f5496"><span>-<span
                        style="font:7.0pt "Times New Roman"">         
                      </span></span></span><span dir="LTR"></span><b><i><span
                        style="color:#2f5496"> Elena</span></i></b></p>
                <p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d"> </span></p>
                <p class="MsoNormal" style="margin-left:36.0pt"><a
                    moz-do-not-send="true"
                    name="m_7380666333963123496______replyseparator"></a><b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif">
                    Michael Kuperstein [mailto:<a moz-do-not-send="true"
                      href="mailto:mkuper@google.com" target="_blank">mkuper@google.com</a>]
                    <br>
                    <b>Sent:</b> Thursday, January 19, 2017 01:19<br>
                    <b>To:</b> Demikhovsky, Elena <<a
                      moz-do-not-send="true"
                      href="mailto:elena.demikhovsky@intel.com"
                      target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:elena.demikhovsky@intel.com">elena.demikhovsky@intel.com</a></a>><br>
                    <b>Cc:</b> llvm-commits <<a
                      moz-do-not-send="true"
                      href="mailto:llvm-commits@lists.llvm.org"
                      target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a></a>><br>
                    <b>Subject:</b> Re: [llvm] r291670 - X86 CodeGen:
                    Optimized pattern for truncate with unsigned
                    saturation.</span></p>
                <div>
                  <div class="h5">
                    <p class="MsoNormal" style="margin-left:36.0pt"> </p>
                    <div>
                      <p class="MsoNormal" style="margin-left:82.2pt">Hi
                        Elena,</p>
                      <div>
                        <p class="MsoNormal" style="margin-left:82.2pt"> </p>
                      </div>
                      <div>
                        <p class="MsoNormal" style="margin-left:82.2pt">This
                          still crashes in more complex cases. I've
                          reverted in r292444, see PR31589 for the
                          reproducer.</p>
                        <div>
                          <div>
                            <div>
                              <p class="MsoNormal"
                                style="margin-left:82.2pt"> </p>
                            </div>
                            <div>
                              <p class="MsoNormal"
                                style="margin-left:82.2pt">Thanks,</p>
                            </div>
                            <div>
                              <p class="MsoNormal"
                                style="margin-left:82.2pt">  Michael</p>
                            </div>
                          </div>
                        </div>
                        <div>
                          <p class="MsoNormal"
                            style="margin-left:82.2pt"> </p>
                          <div>
                            <p class="MsoNormal"
                              style="margin-left:82.2pt">On Wed, Jan 11,
                              2017 at 4:59 AM, Elena Demikhovsky via
                              llvm-commits <<a moz-do-not-send="true"
href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>>
                              wrote:</p>
                            <blockquote
                              style="border:none;border-left:solid
                              #cccccc 1.0pt;padding:0cm 0cm 0cm
                              6.0pt;margin-left:4.8pt;margin-right:0cm">
                              <p class="MsoNormal"
                                style="margin-left:82.2pt">Author:
                                delena<br>
                                Date: Wed Jan 11 06:59:32 2017<br>
                                New Revision: 291670<br>
                                <br>
                                URL: <a moz-do-not-send="true"
                                  href="http://llvm.org/viewvc/llvm-project?rev=291670&view=rev"
                                  target="_blank">
                                  http://llvm.org/viewvc/llvm-<wbr>project?rev=291670&view=rev</a><br>
                                Log:<br>
                                X86 CodeGen: Optimized pattern for
                                truncate with unsigned saturation.<br>
                                <br>
                                DAG patterns optimization: truncate +
                                unsigned saturation supported by
                                VPMOVUS* instructions in AVX-512.<br>
                                And VPACKUS* instructions on SEE*
                                targets.<br>
                                <br>
                                Differential Revision: <a
                                  moz-do-not-send="true"
                                  href="https://reviews.llvm.org/D28216"
                                  target="_blank">
                                  <a class="moz-txt-link-freetext" href="https://reviews.llvm.org/">https://reviews.llvm.org/</a><wbr>D28216</a><br>
                                <br>
                                <br>
                                Modified:<br>
                                    llvm/trunk/lib/Target/X86/<wbr>X86ISelLowering.cpp<br>
                                    llvm/trunk/test/CodeGen/X86/<wbr>avx-trunc.ll<br>
                                    llvm/trunk/test/CodeGen/X86/<wbr>avx512-trunc.ll<br>
                                <br>
                                Modified: llvm/trunk/lib/Target/X86/<wbr>X86ISelLowering.cpp<br>
                                URL: <a moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=291670&r1=291669&r2=291670&view=diff"
                                  target="_blank">
                                  http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/lib/Target/<wbr>X86/X86ISelLowering.cpp?rev=<wbr>291670&r1=291669&r2=291670&<wbr>view=diff</a><br>
                                ==============================<wbr>==============================<wbr>==================<br>
                                --- llvm/trunk/lib/Target/X86/<wbr>X86ISelLowering.cpp
                                (original)<br>
                                +++ llvm/trunk/lib/Target/X86/<wbr>X86ISelLowering.cpp
                                Wed Jan 11 06:59:32 2017<br>
                                @@ -31220,6 +31220,93 @@ static SDValue
                                foldVectorXorShiftIntoCmp<br>
                                   return DAG.getNode(X86ISD::PCMPGT,
                                SDLoc(N), VT, Shift.getOperand(0),
                                Ones);<br>
                                 }<br>
                                <br>
                                +/// Check if truncation with saturation
                                form type \p SrcVT to \p DstVT<br>
                                +/// is valid for the given \p
                                Subtarget.<br>
                                +static bool
                                isSATValidOnAVX512Subtarget(<wbr>EVT
                                SrcVT, EVT DstVT,<br>
                                +                                       
                                const X86Subtarget &Subtarget) {<br>
                                +  if (!Subtarget.hasAVX512())<br>
                                +    return false;<br>
                                +<br>
                                +  // FIXME: Scalar type may be
                                supported if we move it to vector
                                register.<br>
                                +  if (!SrcVT.isVector() ||
                                !SrcVT.isSimple() ||
                                SrcVT.getSizeInBits() > 512)<br>
                                +    return false;<br>
                                +<br>
                                +  EVT SrcElVT = SrcVT.getScalarType();<br>
                                +  EVT DstElVT = DstVT.getScalarType();<br>
                                +  if (SrcElVT.getSizeInBits() < 16
                                || SrcElVT.getSizeInBits() > 64)<br>
                                +    return false;<br>
                                +  if (DstElVT.getSizeInBits() < 8 ||
                                DstElVT.getSizeInBits() > 32)<br>
                                +    return false;<br>
                                +  if (SrcVT.is512BitVector() ||
                                Subtarget.hasVLX())<br>
                                +    return SrcElVT.getSizeInBits()
                                >= 32 || Subtarget.hasBWI();<br>
                                +  return false;<br>
                                +}<br>
                                +<br>
                                +/// Return true if VPACK* instruction
                                can be used for the given types<br>
                                +/// and it is avalable on \p Subtarget.<br>
                                +static bool<br>
                                +isSATValidOnSSESubtarget(EVT SrcVT, EVT
                                DstVT, const X86Subtarget
                                &Subtarget) {<br>
                                +  if (Subtarget.hasSSE2())<br>
                                +    // v16i16 -> v16i8<br>
                                +    if (SrcVT == MVT::v16i16 &&
                                DstVT == MVT::v16i8)<br>
                                +      return true;<br>
                                +  if (Subtarget.hasSSE41())<br>
                                +    // v8i32 -> v8i16<br>
                                +    if (SrcVT == MVT::v8i32 &&
                                DstVT == MVT::v8i16)<br>
                                +      return true;<br>
                                +  return false;<br>
                                +}<br>
                                +<br>
                                +/// Detect a pattern of truncation with
                                saturation:<br>
                                +/// (truncate (umin (x,
                                unsigned_max_of_dest_type)) to
                                dest_type).<br>
                                +/// Return the source value to be
                                truncated or SDValue() if the pattern
                                was not<br>
                                +/// matched.<br>
                                +static SDValue
                                detectUSatPattern(SDValue In, EVT VT) {<br>
                                +  if (In.getOpcode() != ISD::UMIN)<br>
                                +    return SDValue();<br>
                                +<br>
                                +  //Saturation with truncation. We
                                truncate from InVT to VT.<br>
                                +  assert(In.<wbr>getScalarValueSizeInBits()
                                > VT.getScalarSizeInBits() &&<br>
                                +    "Unexpected types for truncate
                                operation");<br>
                                +<br>
                                +  APInt C;<br>
                                +  if (ISD::isConstantSplatVector(<wbr>In.getOperand(1).getNode(),
                                C)) {<br>
                                +    // C should be equal to UINT32_MAX
                                / UINT16_MAX / UINT8_MAX according<br>
                                +    // the element size of the
                                destination type.<br>
                                +    return APIntOps::isMask(VT.<wbr>getScalarSizeInBits(),
                                C) ? In.getOperand(0) :<br>
                                +      SDValue();<br>
                                +  }<br>
                                +  return SDValue();<br>
                                +}<br>
                                +<br>
                                +/// Detect a pattern of truncation with
                                saturation:<br>
                                +/// (truncate (umin (x,
                                unsigned_max_of_dest_type)) to
                                dest_type).<br>
                                +/// The types should allow to use
                                VPMOVUS* instruction on AVX512.<br>
                                +/// Return the source value to be
                                truncated or SDValue() if the pattern
                                was not<br>
                                +/// matched.<br>
                                +static SDValue detectAVX512USatPattern(<wbr>SDValue
                                In, EVT VT,<br>
                                +                                     
                                 const X86Subtarget &Subtarget) {<br>
                                +  if (!isSATValidOnAVX512Subtarget(<wbr>In.getValueType(),
                                VT, Subtarget))<br>
                                +    return SDValue();<br>
                                +  return detectUSatPattern(In, VT);<br>
                                +}<br>
                                +<br>
                                +static SDValue<br>
                                +combineTruncateWithUSat(<wbr>SDValue
                                In, EVT VT, SDLoc &DL, SelectionDAG
                                &DAG,<br>
                                +                        const
                                X86Subtarget &Subtarget) {<br>
                                +  SDValue USatVal =
                                detectUSatPattern(In, VT);<br>
                                +  if (USatVal) {<br>
                                +    if (isSATValidOnAVX512Subtarget(<wbr>In.getValueType(),
                                VT, Subtarget))<br>
                                +      return
                                DAG.getNode(X86ISD::VTRUNCUS, DL, VT,
                                USatVal);<br>
                                +    if (isSATValidOnSSESubtarget(In.<wbr>getValueType(),
                                VT, Subtarget)) {<br>
                                +      SDValue Lo, Hi;<br>
                                +      std::tie(Lo, Hi) =
                                DAG.SplitVector(USatVal, DL);<br>
                                +      return
                                DAG.getNode(X86ISD::PACKUS, DL, VT, Lo,
                                Hi);<br>
                                +    }<br>
                                +  }<br>
                                +  return SDValue();<br>
                                +}<br>
                                +<br>
                                 /// This function detects the AVG
                                pattern between vectors of unsigned
                                i8/i16,<br>
                                 /// which is c = (a + b + 1) / 2, and
                                replace this operation with the
                                efficient<br>
                                 /// X86ISD::AVG instruction.<br>
                                @@ -31786,6 +31873,12 @@ static SDValue
                                combineStore(SDNode *N, S<br>
                                                         
                                 St->getPointerInfo(),
                                St->getAlignment(),<br>
                                                         
                                 St->getMemOperand()-><wbr>getFlags());<br>
                                <br>
                                +    if (SDValue Val =<br>
                                +        detectAVX512USatPattern(St-><wbr>getValue(),
                                St->getMemoryVT(), Subtarget))<br>
                                +      return EmitTruncSStore(false /*
                                Unsigned saturation */,
                                St->getChain(),<br>
                                +                             dl, Val,
                                St->getBasePtr(),<br>
                                +                           
                                 St->getMemoryVT(),
                                St->getMemOperand(), DAG);<br>
                                +<br>
                                     const TargetLowering &TLI =
                                DAG.getTargetLoweringInfo();<br>
                                     unsigned NumElems =
                                VT.getVectorNumElements();<br>
                                     assert(StVT != VT &&
                                "Cannot truncate to the same type");<br>
                                @@ -32406,6 +32499,10 @@ static SDValue
                                combineTruncate(SDNode *N<br>
                                   if (SDValue Avg =
                                detectAVGPattern(Src, VT, DAG,
                                Subtarget, DL))<br>
                                     return Avg;<br>
                                <br>
                                +  // Try to combine truncation with
                                unsigned saturation.<br>
                                +  if (SDValue Val =
                                combineTruncateWithUSat(Src, VT, DL,
                                DAG, Subtarget))<br>
                                +    return Val;<br>
                                +<br>
                                   // The bitcast source is a direct mmx
                                result.<br>
                                   // Detect bitcasts between i32 to
                                x86mmx<br>
                                   if (Src.getOpcode() == ISD::BITCAST
                                && VT == MVT::i32) {<br>
                                <br>
                                Modified: llvm/trunk/test/CodeGen/X86/<wbr>avx-trunc.ll<br>
                                URL: <a moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-trunc.ll?rev=291670&r1=291669&r2=291670&view=diff"
                                  target="_blank">
                                  http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/test/<wbr>CodeGen/X86/avx-trunc.ll?rev=<wbr>291670&r1=291669&r2=291670&<wbr>view=diff</a><br>
                                ==============================<wbr>==============================<wbr>==================<br>
                                --- llvm/trunk/test/CodeGen/X86/<wbr>avx-trunc.ll
                                (original)<br>
                                +++ llvm/trunk/test/CodeGen/X86/<wbr>avx-trunc.ll
                                Wed Jan 11 06:59:32 2017<br>
                                @@ -39,3 +39,29 @@ define <16 x
                                i8> @trunc_16_8(<16 x i16><br>
                                   %B = trunc <16 x i16> %A to
                                <16 x i8><br>
                                   ret <16 x i8> %B<br>
                                 }<br>
                                +<br>
                                +define <16 x i8>
                                @usat_trunc_wb_256(<16 x i16> %i)
                                {<br>
                                +; CHECK-LABEL: usat_trunc_wb_256:<br>
                                +; CHECK:       # BB#0:<br>
                                +; CHECK-NEXT:    vextractf128 $1,
                                %ymm0, %xmm1<br>
                                +; CHECK-NEXT:    vpackuswb %xmm1,
                                %xmm0, %xmm0<br>
                                +; CHECK-NEXT:    vzeroupper<br>
                                +; CHECK-NEXT:    retq<br>
                                +  %x3 = icmp ult <16 x i16> %i,
                                <i16 255, i16 255, i16 255, i16 255,
                                i16 255, i16 255, i16 255, i16 255, i16
                                255, i16 255, i16 255, i16 255, i16 255,
                                i16 255, i16 255, i16 255><br>
                                +  %x5 = select <16 x i1> %x3,
                                <16 x i16> %i, <16 x i16>
                                <i16 255, i16 255, i16 255, i16 255,
                                i16 255, i16 255, i16 255, i16 255, i16
                                255, i16 255, i16 255, i16 255, i16 255,
                                i16 255, i16 255, i16 255><br>
                                +  %x6 = trunc <16 x i16> %x5 to
                                <16 x i8><br>
                                +  ret <16 x i8> %x6<br>
                                +}<br>
                                +<br>
                                +define <8 x i16>
                                @usat_trunc_dw_256(<8 x i32> %i) {<br>
                                +; CHECK-LABEL: usat_trunc_dw_256:<br>
                                +; CHECK:       # BB#0:<br>
                                +; CHECK-NEXT:    vextractf128 $1,
                                %ymm0, %xmm1<br>
                                +; CHECK-NEXT:    vpackusdw %xmm1,
                                %xmm0, %xmm0<br>
                                +; CHECK-NEXT:    vzeroupper<br>
                                +; CHECK-NEXT:    retq<br>
                                +  %x3 = icmp ult <8 x i32> %i,
                                <i32 65535, i32 65535, i32 65535, i32
                                65535, i32 65535, i32 65535, i32 65535,
                                i32 65535><br>
                                +  %x5 = select <8 x i1> %x3,
                                <8 x i32> %i, <8 x i32>
                                <i32 65535, i32 65535, i32 65535, i32
                                65535, i32 65535, i32 65535, i32 65535,
                                i32 65535><br>
                                +  %x6 = trunc <8 x i32> %x5 to
                                <8 x i16><br>
                                +  ret <8 x i16> %x6<br>
                                +}<br>
                                <br>
                                Modified: llvm/trunk/test/CodeGen/X86/<wbr>avx512-trunc.ll<br>
                                URL: <a moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx512-trunc.ll?rev=291670&r1=291669&r2=291670&view=diff"
                                  target="_blank">
                                  http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/test/<wbr>CodeGen/X86/avx512-trunc.ll?<wbr>rev=291670&r1=291669&r2=<wbr>291670&view=diff</a><br>
                                ==============================<wbr>==============================<wbr>==================<br>
                                --- llvm/trunk/test/CodeGen/X86/<wbr>avx512-trunc.ll
                                (original)<br>
                                +++ llvm/trunk/test/CodeGen/X86/<wbr>avx512-trunc.ll
                                Wed Jan 11 06:59:32 2017<br>
                                @@ -500,3 +500,208 @@ define void
                                @trunc_wb_128_mem(<8 x i16><br>
                                     store <8 x i8> %x, <8 x
                                i8>* %res<br>
                                     ret void<br>
                                 }<br>
                                +<br>
                                +<br>
                                +define void
                                @usat_trunc_wb_256_mem(<16 x i16>
                                %i, <16 x i8>* %res) {<br>
                                +; KNL-LABEL: usat_trunc_wb_256_mem:<br>
                                +; KNL:       ## BB#0:<br>
                                +; KNL-NEXT:    vextracti128 $1, %ymm0,
                                %xmm1<br>
                                +; KNL-NEXT:    vpackuswb %xmm1, %xmm0,
                                %xmm0<br>
                                +; KNL-NEXT:    vmovdqu %xmm0, (%rdi)<br>
                                +; KNL-NEXT:    retq<br>
                                +;<br>
                                +; SKX-LABEL: usat_trunc_wb_256_mem:<br>
                                +; SKX:       ## BB#0:<br>
                                +; SKX-NEXT:    vpmovuswb %ymm0, (%rdi)<br>
                                +; SKX-NEXT:    retq<br>
                                +  %x3 = icmp ult <16 x i16> %i,
                                <i16 255, i16 255, i16 255, i16 255,
                                i16 255, i16 255, i16 255, i16 255, i16
                                255, i16 255, i16 255, i16 255, i16 255,
                                i16 255, i16 255, i16 255><br>
                                +  %x5 = select <16 x i1> %x3,
                                <16 x i16> %i, <16 x i16>
                                <i16 255, i16 255, i16 255, i16 255,
                                i16 255, i16 255, i16 255, i16 255, i16
                                255, i16 255, i16 255, i16 255, i16 255,
                                i16 255, i16 255, i16 255><br>
                                +  %x6 = trunc <16 x i16> %x5 to
                                <16 x i8><br>
                                +  store <16 x i8> %x6, <16 x
                                i8>* %res, align 1<br>
                                +  ret void<br>
                                +}<br>
                                +<br>
                                +define <16 x i8>
                                @usat_trunc_wb_256(<16 x i16> %i)
                                {<br>
                                +; KNL-LABEL: usat_trunc_wb_256:<br>
                                +; KNL:       ## BB#0:<br>
                                +; KNL-NEXT:    vextracti128 $1, %ymm0,
                                %xmm1<br>
                                +; KNL-NEXT:    vpackuswb %xmm1, %xmm0,
                                %xmm0<br>
                                +; KNL-NEXT:    retq<br>
                                +;<br>
                                +; SKX-LABEL: usat_trunc_wb_256:<br>
                                +; SKX:       ## BB#0:<br>
                                +; SKX-NEXT:    vpmovuswb %ymm0, %xmm0<br>
                                +; SKX-NEXT:    retq<br>
                                +  %x3 = icmp ult <16 x i16> %i,
                                <i16 255, i16 255, i16 255, i16 255,
                                i16 255, i16 255, i16 255, i16 255, i16
                                255, i16 255, i16 255, i16 255, i16 255,
                                i16 255, i16 255, i16 255><br>
                                +  %x5 = select <16 x i1> %x3,
                                <16 x i16> %i, <16 x i16>
                                <i16 255, i16 255, i16 255, i16 255,
                                i16 255, i16 255, i16 255, i16 255, i16
                                255, i16 255, i16 255, i16 255, i16 255,
                                i16 255, i16 255, i16 255><br>
                                +  %x6 = trunc <16 x i16> %x5 to
                                <16 x i8><br>
                                +  ret <16 x i8> %x6<br>
                                +}<br>
                                +<br>
                                +define void
                                @usat_trunc_wb_128_mem(<8 x i16>
                                %i, <8 x i8>* %res) {<br>
                                +; KNL-LABEL: usat_trunc_wb_128_mem:<br>
                                +; KNL:       ## BB#0:<br>
                                +; KNL-NEXT:    vpminuw {{.*}}(%rip),
                                %xmm0, %xmm0<br>
                                +; KNL-NEXT:    vpshufb {{.*#+}} xmm0 =
                                xmm0[0,2,4,6,8,10,12,14,u,u,u,<wbr>u,u,u,u,u]<br>
                                +; KNL-NEXT:    vmovq %xmm0, (%rdi)<br>
                                +; KNL-NEXT:    retq<br>
                                +;<br>
                                +; SKX-LABEL: usat_trunc_wb_128_mem:<br>
                                +; SKX:       ## BB#0:<br>
                                +; SKX-NEXT:    vpmovuswb %xmm0, (%rdi)<br>
                                +; SKX-NEXT:    retq<br>
                                +  %x3 = icmp ult <8 x i16> %i,
                                <i16 255, i16 255, i16 255, i16 255,
                                i16 255, i16 255, i16 255, i16 255><br>
                                +  %x5 = select <8 x i1> %x3,
                                <8 x i16> %i, <8 x i16>
                                <i16 255, i16 255, i16 255, i16 255,
                                i16 255, i16 255, i16 255, i16 255><br>
                                +  %x6 = trunc <8 x i16> %x5 to
                                <8 x i8><br>
                                +  store <8 x i8> %x6, <8 x
                                i8>* %res, align 1<br>
                                +  ret void<br>
                                +}<br>
                                +<br>
                                +define void
                                @usat_trunc_db_512_mem(<16 x i32>
                                %i, <16 x i8>* %res) {<br>
                                +; ALL-LABEL: usat_trunc_db_512_mem:<br>
                                +; ALL:       ## BB#0:<br>
                                +; ALL-NEXT:    vpmovusdb %zmm0, (%rdi)<br>
                                +; ALL-NEXT:    retq<br>
                                +  %x3 = icmp ult <16 x i32> %i,
                                <i32 255, i32 255, i32 255, i32 255,
                                i32 255, i32 255, i32 255, i32 255, i32
                                255, i32 255, i32 255, i32 255, i32 255,
                                i32 255, i32 255, i32 255><br>
                                +  %x5 = select <16 x i1> %x3,
                                <16 x i32> %i, <16 x i32>
                                <i32 255, i32 255, i32 255, i32 255,
                                i32 255, i32 255, i32 255, i32 255, i32
                                255, i32 255, i32 255, i32 255, i32 255,
                                i32 255, i32 255, i32 255><br>
                                +  %x6 = trunc <16 x i32> %x5 to
                                <16 x i8><br>
                                +  store <16 x i8> %x6, <16 x
                                i8>* %res, align 1<br>
                                +  ret void<br>
                                +}<br>
                                +<br>
                                +define void
                                @usat_trunc_qb_512_mem(<8 x i64>
                                %i, <8 x i8>* %res) {<br>
                                +; ALL-LABEL: usat_trunc_qb_512_mem:<br>
                                +; ALL:       ## BB#0:<br>
                                +; ALL-NEXT:    vpmovusqb %zmm0, (%rdi)<br>
                                +; ALL-NEXT:    retq<br>
                                +  %x3 = icmp ult <8 x i64> %i,
                                <i64 255, i64 255, i64 255, i64 255,
                                i64 255, i64 255, i64 255, i64 255><br>
                                +  %x5 = select <8 x i1> %x3,
                                <8 x i64> %i, <8 x i64>
                                <i64 255, i64 255, i64 255, i64 255,
                                i64 255, i64 255, i64 255, i64 255><br>
                                +  %x6 = trunc <8 x i64> %x5 to
                                <8 x i8><br>
                                +  store <8 x i8> %x6, <8 x
                                i8>* %res, align 1<br>
                                +  ret void<br>
                                +}<br>
                                +<br>
                                +define void
                                @usat_trunc_qd_512_mem(<8 x i64>
                                %i, <8 x i32>* %res) {<br>
                                +; ALL-LABEL: usat_trunc_qd_512_mem:<br>
                                +; ALL:       ## BB#0:<br>
                                +; ALL-NEXT:    vpmovusqd %zmm0, (%rdi)<br>
                                +; ALL-NEXT:    retq<br>
                                +  %x3 = icmp ult <8 x i64> %i,
                                <i64 4294967295, i64 4294967295, i64
                                4294967295, i64 4294967295, i64
                                4294967295, i64 4294967295, i64
                                4294967295, i64 4294967295><br>
                                +  %x5 = select <8 x i1> %x3,
                                <8 x i64> %i, <8 x i64>
                                <i64 4294967295, i64 4294967295, i64
                                4294967295, i64 4294967295, i64
                                4294967295, i64 4294967295, i64
                                4294967295, i64 4294967295><br>
                                +  %x6 = trunc <8 x i64> %x5 to
                                <8 x i32><br>
                                +  store <8 x i32> %x6, <8 x
                                i32>* %res, align 1<br>
                                +  ret void<br>
                                +}<br>
                                +<br>
                                +define void
                                @usat_trunc_qw_512_mem(<8 x i64>
                                %i, <8 x i16>* %res) {<br>
                                +; ALL-LABEL: usat_trunc_qw_512_mem:<br>
                                +; ALL:       ## BB#0:<br>
                                +; ALL-NEXT:    vpmovusqw %zmm0, (%rdi)<br>
                                +; ALL-NEXT:    retq<br>
                                +  %x3 = icmp ult <8 x i64> %i,
                                <i64 65535, i64 65535, i64 65535, i64
                                65535, i64 65535, i64 65535, i64 65535,
                                i64 65535><br>
                                +  %x5 = select <8 x i1> %x3,
                                <8 x i64> %i, <8 x i64>
                                <i64 65535, i64 65535, i64 65535, i64
                                65535, i64 65535, i64 65535, i64 65535,
                                i64 65535><br>
                                +  %x6 = trunc <8 x i64> %x5 to
                                <8 x i16><br>
                                +  store <8 x i16> %x6, <8 x
                                i16>* %res, align 1<br>
                                +  ret void<br>
                                +}<br>
                                +<br>
                                +define <32 x i8>
                                @usat_trunc_db_1024(<32 x i32> %i)
                                {<br>
                                +; KNL-LABEL: usat_trunc_db_1024:<br>
                                +; KNL:       ## BB#0:<br>
                                +; KNL-NEXT:    vpmovusdb %zmm0, %xmm0<br>
                                +; KNL-NEXT:    vpmovusdb %zmm1, %xmm1<br>
                                +; KNL-NEXT:    vinserti128 $1, %xmm1,
                                %ymm0, %ymm0<br>
                                +; KNL-NEXT:    retq<br>
                                +;<br>
                                +; SKX-LABEL: usat_trunc_db_1024:<br>
                                +; SKX:       ## BB#0:<br>
                                +; SKX-NEXT:    vpbroadcastd
                                {{.*}}(%rip), %zmm2<br>
                                +; SKX-NEXT:    vpminud %zmm2, %zmm1,
                                %zmm1<br>
                                +; SKX-NEXT:    vpminud %zmm2, %zmm0,
                                %zmm0<br>
                                +; SKX-NEXT:    vpmovdw %zmm0, %ymm0<br>
                                +; SKX-NEXT:    vpmovdw %zmm1, %ymm1<br>
                                +; SKX-NEXT:    vinserti64x4 $1, %ymm1,
                                %zmm0, %zmm0<br>
                                +; SKX-NEXT:    vpmovwb %zmm0, %ymm0<br>
                                +; SKX-NEXT:    retq<br>
                                +  %x3 = icmp ult <32 x i32> %i,
                                <i32 255, i32 255, i32 255, i32 255,
                                i32 255, i32 255, i32 255, i32 255, i32
                                255, i32 255, i32 255, i32 255, i32 255,
                                i32 255, i32 255, i32 255, i32 255, i32
                                255, i32 255, i32 255, i32 255, i32 255,
                                i32 255, i32 255, i32 255, i32 255, i32
                                255, i32 255, i32 255, i32 255, i32 255,
                                i32 255><br>
                                +  %x5 = select <32 x i1> %x3,
                                <32 x i32> %i, <32 x i32>
                                <i32 255, i32 255, i32 255, i32 255,
                                i32 255, i32 255, i32 255, i32 255, i32
                                255, i32 255, i32 255, i32 255, i32 255,
                                i32 255, i32 255, i32 255, i32 255, i32
                                255, i32 255, i32 255, i32 255, i32 255,
                                i32 255, i32 255, i32 255, i32 255, i32
                                255, i32 255, i32 255, i32 255, i32 255,
                                i32 255><br>
                                +  %x6 = trunc <32 x i32> %x5 to
                                <32 x i8><br>
                                +  ret <32 x i8> %x6<br>
                                +}<br>
                                +<br>
                                +define void
                                @usat_trunc_db_1024_mem(<32 x i32>
                                %i, <32 x i8>* %p) {<br>
                                +; KNL-LABEL: usat_trunc_db_1024_mem:<br>
                                +; KNL:       ## BB#0:<br>
                                +; KNL-NEXT:    vpmovusdb %zmm0, %xmm0<br>
                                +; KNL-NEXT:    vpmovusdb %zmm1, %xmm1<br>
                                +; KNL-NEXT:    vinserti128 $1, %xmm1,
                                %ymm0, %ymm0<br>
                                +; KNL-NEXT:    vmovdqu %ymm0, (%rdi)<br>
                                +; KNL-NEXT:    retq<br>
                                +;<br>
                                +; SKX-LABEL: usat_trunc_db_1024_mem:<br>
                                +; SKX:       ## BB#0:<br>
                                +; SKX-NEXT:    vpbroadcastd
                                {{.*}}(%rip), %zmm2<br>
                                +; SKX-NEXT:    vpminud %zmm2, %zmm1,
                                %zmm1<br>
                                +; SKX-NEXT:    vpminud %zmm2, %zmm0,
                                %zmm0<br>
                                +; SKX-NEXT:    vpmovdw %zmm0, %ymm0<br>
                                +; SKX-NEXT:    vpmovdw %zmm1, %ymm1<br>
                                +; SKX-NEXT:    vinserti64x4 $1, %ymm1,
                                %zmm0, %zmm0<br>
                                +; SKX-NEXT:    vpmovwb %zmm0, (%rdi)<br>
                                +; SKX-NEXT:    retq<br>
                                +  %x3 = icmp ult <32 x i32> %i,
                                <i32 255, i32 255, i32 255, i32 255,
                                i32 255, i32 255, i32 255, i32 255, i32
                                255, i32 255, i32 255, i32 255, i32 255,
                                i32 255, i32 255, i32 255, i32 255, i32
                                255, i32 255, i32 255, i32 255, i32 255,
                                i32 255, i32 255, i32 255, i32 255, i32
                                255, i32 255, i32 255, i32 255, i32 255,
                                i32 255><br>
                                +  %x5 = select <32 x i1> %x3,
                                <32 x i32> %i, <32 x i32>
                                <i32 255, i32 255, i32 255, i32 255,
                                i32 255, i32 255, i32 255, i32 255, i32
                                255, i32 255, i32 255, i32 255, i32 255,
                                i32 255, i32 255, i32 255, i32 255, i32
                                255, i32 255, i32 255, i32 255, i32 255,
                                i32 255, i32 255, i32 255, i32 255, i32
                                255, i32 255, i32 255, i32 255, i32 255,
                                i32 255><br>
                                +  %x6 = trunc <32 x i32> %x5 to
                                <32 x i8><br>
                                +  store <32 x i8>%x6, <32 x
                                i8>* %p, align 1<br>
                                +  ret void<br>
                                +}<br>
                                +<br>
                                +define <16 x i16>
                                @usat_trunc_dw_512(<16 x i32> %i)
                                {<br>
                                +; ALL-LABEL: usat_trunc_dw_512:<br>
                                +; ALL:       ## BB#0:<br>
                                +; ALL-NEXT:    vpmovusdw %zmm0, %ymm0<br>
                                +; ALL-NEXT:    retq<br>
                                +  %x3 = icmp ult <16 x i32> %i,
                                <i32 65535, i32 65535, i32 65535, i32
                                65535, i32 65535, i32 65535, i32 65535,
                                i32 65535, i32 65535, i32 65535, i32
                                65535, i32 65535, i32 65535, i32 65535,
                                i32 65535, i32 65535><br>
                                +  %x5 = select <16 x i1> %x3,
                                <16 x i32> %i, <16 x i32>
                                <i32 65535, i32 65535, i32 65535, i32
                                65535, i32 65535, i32 65535, i32 65535,
                                i32 65535, i32 65535, i32 65535, i32
                                65535, i32 65535, i32 65535, i32 65535,
                                i32 65535, i32 65535><br>
                                +  %x6 = trunc <16 x i32> %x5 to
                                <16 x i16><br>
                                +  ret <16 x i16> %x6<br>
                                +}<br>
                                +<br>
                                +define <8 x i8>
                                @usat_trunc_wb_128(<8 x i16> %i) {<br>
                                +; ALL-LABEL: usat_trunc_wb_128:<br>
                                +; ALL:       ## BB#0:<br>
                                +; ALL-NEXT:    vpminuw {{.*}}(%rip),
                                %xmm0, %xmm0<br>
                                +; ALL-NEXT:    retq<br>
                                +  %x3 = icmp ult <8 x i16> %i,
                                <i16 255, i16 255, i16 255, i16 255,
                                i16 255, i16 255, i16 255, i16 255><br>
                                +  %x5 = select <8 x i1> %x3,
                                <8 x i16> %i, <8 x i16>
                                <i16 255, i16 255, i16 255, i16 255,
                                i16 255, i16 255, i16 255, i16 255><br>
                                +  %x6 = trunc <8 x i16> %x5 to
                                <8 x i8><br>
                                +  ret <8 x i8>%x6<br>
                                +}<br>
                                +<br>
                                +define <16 x i16>
                                @usat_trunc_qw_1024(<16 x i64> %i)
                                {<br>
                                +; KNL-LABEL: usat_trunc_qw_1024:<br>
                                +; KNL:       ## BB#0:<br>
                                +; KNL-NEXT:    vpbroadcastq
                                {{.*}}(%rip), %zmm2<br>
                                +; KNL-NEXT:    vpminuq %zmm2, %zmm1,
                                %zmm1<br>
                                +; KNL-NEXT:    vpminuq %zmm2, %zmm0,
                                %zmm0<br>
                                +; KNL-NEXT:    vpmovqd %zmm0, %ymm0<br>
                                +; KNL-NEXT:    vpmovqd %zmm1, %ymm1<br>
                                +; KNL-NEXT:    vinserti64x4 $1, %ymm1,
                                %zmm0, %zmm0<br>
                                +; KNL-NEXT:    vpmovdw %zmm0, %ymm0<br>
                                +; KNL-NEXT:    retq<br>
                                +;<br>
                                +; SKX-LABEL: usat_trunc_qw_1024:<br>
                                +; SKX:       ## BB#0:<br>
                                +; SKX-NEXT:    vpbroadcastq
                                {{.*}}(%rip), %zmm2<br>
                                +; SKX-NEXT:    vpminuq %zmm2, %zmm1,
                                %zmm1<br>
                                +; SKX-NEXT:    vpminuq %zmm2, %zmm0,
                                %zmm0<br>
                                +; SKX-NEXT:    vpmovqd %zmm0, %ymm0<br>
                                +; SKX-NEXT:    vpmovqd %zmm1, %ymm1<br>
                                +; SKX-NEXT:    vinserti32x8 $1, %ymm1,
                                %zmm0, %zmm0<br>
                                +; SKX-NEXT:    vpmovdw %zmm0, %ymm0<br>
                                +; SKX-NEXT:    retq<br>
                                +  %x3 = icmp ult <16 x i64> %i,
                                <i64 65535, i64 65535, i64 65535, i64
                                65535, i64 65535, i64 65535, i64 65535,
                                i64 65535, i64 65535, i64 65535, i64
                                65535, i64 65535, i64 65535, i64 65535,
                                i64 65535, i64 65535><br>
                                +  %x5 = select <16 x i1> %x3,
                                <16 x i64> %i, <16 x i64>
                                <i64 65535, i64 65535, i64 65535, i64
                                65535, i64 65535, i64 65535, i64 65535,
                                i64 65535, i64 65535, i64 65535, i64
                                65535, i64 65535, i64 65535, i64 65535,
                                i64 65535, i64 65535><br>
                                +  %x6 = trunc <16 x i64> %x5 to
                                <16 x i16><br>
                                +  ret <16 x i16> %x6<br>
                                +}<br>
                                +<br>
                                <br>
                                <br>
                                ______________________________<wbr>_________________<br>
                                llvm-commits mailing list<br>
                                <a moz-do-not-send="true"
                                  href="mailto:llvm-commits@lists.llvm.org"
                                  target="_blank">llvm-commits@lists.llvm.org</a><br>
                                <a moz-do-not-send="true"
                                  href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits"
                                  target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-commits</a></p>
                            </blockquote>
                          </div>
                          <p class="MsoNormal"
                            style="margin-left:82.2pt"> </p>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
              </div>
              <p>------------------------------<wbr>------------------------------<wbr>---------<br>
                Intel Israel (74) Limited</p>
              <p>This e-mail and any attachments may contain
                confidential material for<br>
                the sole use of the intended recipient(s). Any review or
                distribution<br>
                by others is strictly prohibited. If you are not the
                intended<br>
                recipient, please contact the sender and delete all
                copies.</p>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
llvm-commits mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>