<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">On 04/04/2019 14:11, Sander De Smalen
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:67D8F282-9E37-473F-9973-AA981D992711@arm.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <meta name="Generator" content="Microsoft Word 15 (filtered
        medium)">
      <style><!--
/* Font Definitions */
@font-face
        {font-family:Wingdings;
        panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
        {font-family:"MS Gothic";
        panose-1:2 11 6 9 7 2 5 8 2 4;}
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:"\@MS Gothic";
        panose-1:2 11 6 9 7 2 5 8 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:#0563C1;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:#954F72;
        text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
        {mso-style-priority:34;
        margin-top:0cm;
        margin-right:0cm;
        margin-bottom:0cm;
        margin-left:36.0pt;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Calibri",sans-serif;}
span.EmailStyle17
        {mso-style-type:personal-compose;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
/* List Definitions */
@list l0
        {mso-list-id:2092389295;
        mso-list-type:hybrid;
        mso-list-template-ids:1156977324 1390607260 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l0:level1
        {mso-level-start-at:2;
        mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:18.0pt;
        text-indent:-18.0pt;
        font-family:Symbol;
        mso-fareast-font-family:Calibri;
        mso-bidi-font-family:"Times New Roman";}
@list l0:level2
        {mso-level-number-format:bullet;
        mso-level-text:o;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:54.0pt;
        text-indent:-18.0pt;
        font-family:"Courier New";}
@list l0:level3
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:90.0pt;
        text-indent:-18.0pt;
        font-family:Wingdings;}
@list l0:level4
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:126.0pt;
        text-indent:-18.0pt;
        font-family:Symbol;}
@list l0:level5
        {mso-level-number-format:bullet;
        mso-level-text:o;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:162.0pt;
        text-indent:-18.0pt;
        font-family:"Courier New";}
@list l0:level6
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:198.0pt;
        text-indent:-18.0pt;
        font-family:Wingdings;}
@list l0:level7
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:234.0pt;
        text-indent:-18.0pt;
        font-family:Symbol;}
@list l0:level8
        {mso-level-number-format:bullet;
        mso-level-text:o;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:270.0pt;
        text-indent:-18.0pt;
        font-family:"Courier New";}
@list l0:level9
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:306.0pt;
        text-indent:-18.0pt;
        font-family:Wingdings;}
ol
        {margin-bottom:0cm;}
ul
        {margin-bottom:0cm;}
--></style>
      <div class="WordSection1"><span style="font-size:11.0pt">Proposed
          change:<o:p></o:p></span>
        <p class="MsoNormal"><span style="font-size:11.0pt">----------------------------<o:p></o:p></span></p>
        <p class="MsoNormal"><span style="font-size:11.0pt">In this RFC
            I propose changing the intrinsics for
            llvm.experimental.vector.reduce.fadd and
            llvm.experimental.vector.reduce.fmul (see options A and B).
            I also propose renaming the 'accumulator' operand to 'start
            value' because for fmul this is the start value of the
            reduction, rather than a value to which the fmul reduction
            is accumulated into.<o:p></o:p></span></p>
        <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span style="font-size:11.0pt">[Option A]
            Always using the start value operand in the reduction (<a
              href="https://reviews.llvm.org/D60261"
              moz-do-not-send="true">https://reviews.llvm.org/D60261</a>)<o:p></o:p></span></p>
        <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span style="font-size:11.0pt">  declare
            float
            @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float
            %start_value, <4 x float> %vec)<o:p></o:p></span></p>
        <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span style="font-size:11.0pt">This means
            that if the start value is 'undef', the result will be undef
            and all code creating such a reduction will need to ensure
            it has a sensible start value (e.g. 0.0 for fadd, 1.0 for
            fmul). When using 'fast' or ‘reassoc’ on the call it will be
            implemented using an unordered reduction, otherwise it will
            be implemented with an ordered reduction. Note that a new
            intrinsic is required to capture the new semantics. In this
            proposal the intrinsic is prefixed with a 'v2' for the time
            being, with the expectation this will be dropped when we
            remove 'experimental' from the reduction intrinsics in the
            future.</span><span
            style="font-size:11.0pt;font-family:"MS Gothic""><o:p></o:p></span></p>
        <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span style="font-size:11.0pt">[Option B]
            Having separate ordered and unordered intrinsics (<a
              href="https://reviews.llvm.org/D60262"
              moz-do-not-send="true">https://reviews.llvm.org/D60262</a>).<o:p></o:p></span></p>
        <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span style="font-size:11.0pt">  declare
            float
            @llvm.experimental.vector.reduce.ordered.fadd.f32.v4f32(float
            %start_value, <4 x float> %vec)<o:p></o:p></span></p>
        <p class="MsoNormal"><span style="font-size:11.0pt">  declare
            float
            @llvm.experimental.vector.reduce.unordered.fadd.f32.v4f32(<4
            x float> %vec)<o:p></o:p></span></p>
        <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span style="font-size:11.0pt">This will
            mean that the behaviour is explicit from the intrinsic and
            the use of 'fast' or ‘reassoc’ on the call has no effect on
            how that intrinsic is lowered. The ordered reduction
            intrinsic will take a scalar start-value operand, where the
            unordered reduction intrinsic will only take a vector
            operand.<o:p></o:p></span></p>
        <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span style="font-size:11.0pt">Both options
            auto-upgrade the IR to use the new (version of the)
            intrinsics. I'm personally slightly in favour of [Option B],
            because it better aligns with the definition of the
            SelectionDAG nodes and is more explicit in its semantics. We
            also avoid having to use an artificial 'v2' like prefix to
            denote the new behaviour of the intrinsic.<o:p></o:p></span></p>
        <span style="font-size:11.0pt"><o:p> </o:p></span></div>
    </blockquote>
    <p>Do we have any targets with instructions that can actually use
      the start value? TBH I'd be tempted to suggest we just make the
      initial extractelement/fadd/insertelement pattern a manual extra
      stage and avoid having having that argument entirely. <br>
    </p>
    <blockquote type="cite"
      cite="mid:67D8F282-9E37-473F-9973-AA981D992711@arm.com">
      <div class="WordSection1">
        <p class="MsoNormal"><span style="font-size:11.0pt">Further
            efforts:<o:p></o:p></span></p>
        <p class="MsoNormal"><span style="font-size:11.0pt">----------------------------<o:p></o:p></span></p>
        <p class="MsoNormal"><span style="font-size:11.0pt">Here a
            non-exhaustive list of items I think work towards making the
            intrinsics non-experimental:</span><span
            style="font-size:11.0pt;font-family:"MS Gothic""
            lang="EN-US">
</span><span style="font-size:11.0pt"><o:p></o:p></span></p>
        <ul style="margin-top:0cm" type="disc">
          <li class="MsoListParagraph"
            style="margin-left:-18.0pt;mso-list:l0 level1 lfo1">
            <span style="font-size:11.0pt">Adding SelectionDAG
              legalization for the  _STRICT reduction SDNodes. After
              some great work from Nikita in D58015, unordered
              reductions are now legalized/expanded in SelectionDAG, so
              if we add expansion in SelectionDAG for strict reductions
              this would make the ExpandReductionsPass redundant.<o:p></o:p></span></li>
          <li class="MsoListParagraph"
            style="margin-left:-18.0pt;mso-list:l0 level1 lfo1">
            <span style="font-size:11.0pt">Better enforcing the
              constraints of the intrinsics (see
              <a href="https://reviews.llvm.org/D60260"
                moz-do-not-send="true">https://reviews.llvm.org/D60260</a>
              ).</span><span
              style="font-size:11.0pt;font-family:"MS Gothic""
              lang="EN-US">
</span><span style="font-size:11.0pt"><o:p></o:p></span></li>
          <li class="MsoListParagraph"
            style="margin-left:-18.0pt;mso-list:l0 level1 lfo1">
            <span style="font-size:11.0pt">I think we'll also want to be
              able to overload the result operand based on the vector
              element type for the intrinsics having the constraint that
              the result type must match the vector element type. e.g.
              dropping the redundant 'i32' in:</span><span
              style="font-size:11.0pt;font-family:"MS Gothic""><br>
                </span><span style="font-size:11.0pt">i32
              @llvm.experimental.vector.reduce.and.i32.v4i32(<4 x
              i32> %a) => i32
              @llvm.experimental.vector.reduce.and.v4i32(<4 x i32>
              %a)<o:p></o:p></span></li>
        </ul>
        <p class="MsoListParagraph" style="margin-left:18.0pt"><span
            style="font-size:11.0pt">since i32 is implied by <4 x
            i32>. This would have the added benefit that LLVM would
            automatically check for the operands to match.</span><span
            style="font-size:11.0pt;font-family:"MS Gothic""
            lang="EN-US">
</span></p>
      </div>
    </blockquote>
    <p>Won't this cause issues with overflow? Isn't the point  of an add
      (or mul....) reduction of say, <64 x i8> giving a larger
      (i32 or i64) result so we don't lose anything? I agree for bitop
      reductions it doesn't make sense though.<br>
    </p>
    <p>Simon.<br>
    </p>
  </body>
</html>