<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 10/17/2017 12:58 PM, Friedman, Eli
      via llvm-dev wrote:<br>
    </div>
    <blockquote
      cite="mid:32dd8f01-d533-5566-b6ea-4944472843bc@codeaurora.org"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <div class="moz-cite-prefix">On 10/16/2017 10:22 PM, Cohen, Elad2
        via llvm-dev wrote:<br>
      </div>
      <blockquote type="cite"
cite="mid:568D307AADC9FB4C9256C207A4837F4224FAFBE6@HASMSX105.ger.corp.intel.com">
        <meta name="Generator" content="Microsoft Word 15 (filtered
          medium)">
        <style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:#0563C1;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:#954F72;
        text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
        {mso-style-priority:34;
        margin-top:0cm;
        margin-right:0cm;
        margin-bottom:0cm;
        margin-left:36.0pt;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;}
span.EmailStyle17
        {mso-style-type:personal-compose;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:"Calibri",sans-serif;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 90.0pt 72.0pt 90.0pt;}
div.WordSection1
        {page:WordSection1;}
/* List Definitions */
@list l0
        {mso-list-id:467013342;
        mso-list-type:hybrid;
        mso-list-template-ids:-127225870 466258710 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l0:level1
        {mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:43.5pt;
        text-indent:-18.0pt;}
@list l0:level2
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:84.75pt;
        text-indent:-18.0pt;}
@list l0:level3
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        margin-left:120.75pt;
        text-indent:-9.0pt;}
@list l0:level4
        {mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:156.75pt;
        text-indent:-18.0pt;}
@list l0:level5
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:192.75pt;
        text-indent:-18.0pt;}
@list l0:level6
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        margin-left:228.75pt;
        text-indent:-9.0pt;}
@list l0:level7
        {mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:264.75pt;
        text-indent:-18.0pt;}
@list l0:level8
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:300.75pt;
        text-indent:-18.0pt;}
@list l0:level9
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        margin-left:336.75pt;
        text-indent:-9.0pt;}
ol
        {margin-bottom:0cm;}
ul
        {margin-bottom:0cm;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
        <div class="WordSection1">
          <p class="MsoNormal">Introduction<o:p></o:p></p>
          <p class="MsoNormal">==========<o:p></o:p></p>
          <p class="MsoNormal"><o:p> </o:p></p>
          <p class="MsoNormal">We would like to add support for masked
            vector signed/unsigned integer division and remainder in the
            LLVM IR by introducing new target-independent intrinsics.<o:p></o:p></p>
          <p class="MsoNormal"><o:p> </o:p></p>
          <p class="MsoNormal">This follows similar work which was done
            already for masked vector loads and stores - <a
              href="http://lists.llvm.org/pipermail/llvm-dev/2014-October/078059.html"
              moz-do-not-send="true">http://lists.llvm.org/pipermail/llvm-dev/2014-October/078059.html</a><span
              style="color:#1F497D">.</span><o:p></o:p></p>
          <p class="MsoNormal">Another relevant reference is the masked
            scatter/gather intrinsics discussion - <a
href="http://lists.llvm.org/pipermail/llvm-dev/2014-December/079843.html"
              moz-do-not-send="true">http://lists.llvm.org/pipermail/llvm-dev/2014-December/079843.html</a><span
              style="color:#1F497D">.</span><o:p></o:p></p>
          <p class="MsoNormal"><o:p> </o:p></p>
          <p class="MsoNormal"><o:p> </o:p></p>
          <p class="MsoNormal">Motivation<o:p></o:p></p>
          <p class="MsoNormal">=========<o:p></o:p></p>
          <p class="MsoNormal"><o:p> </o:p></p>
          <p class="MsoNormal">In the current state if the
            loop-vectorizer decides that it should vectorize a loop
            which contains a predicated integer division - it will
            vectorize the loop body and scalarize the predicated
            division instruction into a sequence of branches that guard
            scalar division operations. In some cases the generated code
            for this will not be very efficient. Speculating the divides
            using a non-masked vector sdiv instruction is usually not an
            option due to the danger of integer divide-by-zero.<o:p></o:p></p>
          <p class="MsoNormal">           <o:p></o:p></p>
          <p class="MsoNormal">With the addition of these hereby
            proposed intrinsics the loop-vectorizer could concentrate on
            the vector semantics rather than how to lower them, by
            generating the masked intrinsics.<o:p></o:p></p>
          <p class="MsoNormal">Initially the intrinsics will be
            scalarized for all targets. This could be done by extending
            scalarize-masked-mem-intrin to handle also division masked
            intrinsics. Later the intrinsics could be optimized by:<o:p></o:p></p>
          <p class="MsoListParagraph"
            style="margin-left:43.5pt;text-indent:-18.0pt;mso-list:l0
            level1 lfo1">
            <!--[if !supportLists]--><span style="mso-list:Ignore">1.<span
                style="font:7.0pt "Times New Roman"">       </span></span><!--[endif]--><span
              dir="LTR"></span>Lowering of the intrinsics in the backend
            using different expansions (for example converting to
            floating point and using masked vector floating-point
            division instructions).<o:p></o:p></p>
          <p class="MsoListParagraph"
            style="margin-left:43.5pt;text-indent:-18.0pt;mso-list:l0
            level1 lfo1">
            <!--[if !supportLists]--><span style="mso-list:Ignore">2.<span
                style="font:7.0pt "Times New Roman"">       </span></span><!--[endif]--><span
              dir="LTR"></span>Linking the intrinsics to different
            vector math library implementations.<o:p></o:p></p>
          <p class="MsoListParagraph"
            style="margin-left:43.5pt;text-indent:-18.0pt;mso-list:l0
            level1 lfo1">
            <!--[if !supportLists]--><span style="mso-list:Ignore">3.<span
                style="font:7.0pt "Times New Roman"">       </span></span><!--[endif]--><span
              dir="LTR"></span>Scalarizing the intrinsics at the backend
            possibly using target-specific considerations.<o:p></o:p></p>
          <p class="MsoNormal"><o:p> </o:p></p>
          <p class="MsoNormal"><o:p> </o:p></p>
          <p class="MsoNormal">Proposed Definition (The following
            example is for masked signed division. The rest are similar)<o:p></o:p></p>
          <p class="MsoNormal">========================================================================<o:p></o:p></p>
          <p class="MsoNormal"><o:p> </o:p></p>
          <p class="MsoNormal">     ‘llvm.masked.sdiv’<s><o:p></o:p></s></p>
          <p class="MsoNormal">     <o:p></o:p></p>
          <p class="MsoNormal">     Syntax:<o:p></o:p></p>
          <p class="MsoNormal">     <o:p></o:p></p>
          <p class="MsoNormal">           An overloaded intrinsic. You
            can use llvm.masked.sdiv on any vector with integer
            elements.<o:p></o:p></p>
          <p class="MsoNormal">           <o:p></o:p></p>
          <p class="MsoNormal">           declare <16 x i32> 
            @llvm.masked.sdiv.v16i32(<16 x i32> <a>, <16
            x i32> <b>, <16 x i1> <mask>, <16 x
            i32> <passthru>)<o:p></o:p></p>
          <p class="MsoNormal">     <o:p></o:p></p>
          <p class="MsoNormal">     Overview:<o:p></o:p></p>
          <p class="MsoNormal">     <o:p></o:p></p>
          <p class="MsoNormal">           Returns the quotient of its
            two operands per vector lane according to the provided mask.
            The mask holds a bit for each vector lane, and is used to
            prevent division in the masked-off lanes. The masked-off
            lanes in the result vector are taken from the corresponding
            lanes of the passthru operand.<o:p></o:p></p>
          <p class="MsoNormal">     <o:p></o:p></p>
          <p class="MsoNormal">     Arguments:<o:p></o:p></p>
          <p class="MsoNormal">     <o:p></o:p></p>
          <p class="MsoNormal">           The first two arguments must
            be vectors of integer values. Both arguments must have
            identical types. The third operand, mask, is a vector of
            boolean values with the same number of elements as the first
            two. The fourth is a pass-through value that is used to fill
            the masked-off lanes of the result. The type of the passthru<s>
            </s>operand is the same as the first two.<o:p></o:p></p>
          <p class="MsoNormal">     <o:p></o:p></p>
          <p class="MsoNormal">     Semantics:<o:p></o:p></p>
          <p class="MsoNormal">     <o:p></o:p></p>
          <p class="MsoNormal">           The ‘llvm.masked.sdiv’
            intrinsic is designed for conditional integer division of
            selected vector elements in a single IR operation. The
            result of this operation is equivalent to a regular vector
            'sdiv' instruction followed by a ‘select’ between the loaded
            and the passthru values, predicated on the same mask.
            However, using this intrinsic prevents divide-by-zero
            exceptions on division of masked-off lanes. If any element
            in a turned-on lane of the divisor is zero, the operation
            has undefined behavior.    </p>
        </div>
      </blockquote>
      <br>
      You probably want to mention INT_MIN/-1 overflow here?<br>
      <br>
      ----<br>
      <br>
      The alternative here is to refine the definition of "sdiv" in
      LangRef; other arithmetic operations LLVM IR don't have undefined
      behavior, and the primary reason "sdiv" has undefined behavior is
      the unfortunate behavior of the x86 "IDIV" instruction.  For
      example, we could add a "nooverflow" bit to "sdiv", and say that
      divide-by-zero has undefined behavior if the "nooverflow" bit is
      present, and produces poison otherwise.<br>
    </blockquote>
    <br>
    This seems like a good idea. It will also provide us with a
    well-defined way to speculate/hoist divisions. I presume that we'd
    want to have Clang (etc.) generate all divisions with this bit set,
    but we could clear the bit when vectorizing (or hoisting, if we
    wanted to do that).<br>
    <br>
    On x86, we'd need to lower the form without the nooverflow bit
    present using a test-and-branch sequence, but on other
    architectures, we could use the poison-generating form directly.<br>
    <br>
     -Hal <br>
    <br>
    <blockquote
      cite="mid:32dd8f01-d533-5566-b6ea-4944472843bc@codeaurora.org"
      type="cite"> <br>
      -Eli<br>
      <pre class="moz-signature" cols="72">-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project</pre>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
    </blockquote>
    <br>
    <pre class="moz-signature" cols="72">-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory</pre>
  </body>
</html>