<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p><br>
    </p>
    <div class="moz-cite-prefix">On 05/04/2019 09:37, Simon Pilgrim via
      llvm-dev wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:d306cf98-1225-732d-8016-7e882b5136b1@redking.me.uk">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <div class="moz-cite-prefix">On 04/04/2019 14:11, Sander De Smalen
        wrote:<br>
      </div>
      <blockquote type="cite"
        cite="mid:67D8F282-9E37-473F-9973-AA981D992711@arm.com">
        <meta http-equiv="Content-Type" content="text/html;
          charset=UTF-8">
        <meta name="Generator" content="Microsoft Word 15 (filtered
          medium)">
        <style><!--
/* Font Definitions */
@font-face
        {font-family:Wingdings;
        panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
        {font-family:"MS Gothic";
        panose-1:2 11 6 9 7 2 5 8 2 4;}
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:"\@MS Gothic";
        panose-1:2 11 6 9 7 2 5 8 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:#0563C1;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:#954F72;
        text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
        {mso-style-priority:34;
        margin-top:0cm;
        margin-right:0cm;
        margin-bottom:0cm;
        margin-left:36.0pt;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Calibri",sans-serif;}
span.EmailStyle17
        {mso-style-type:personal-compose;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
/* List Definitions */
@list l0
        {mso-list-id:2092389295;
        mso-list-type:hybrid;
        mso-list-template-ids:1156977324 1390607260 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l0:level1
        {mso-level-start-at:2;
        mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:18.0pt;
        text-indent:-18.0pt;
        font-family:Symbol;
        mso-fareast-font-family:Calibri;
        mso-bidi-font-family:"Times New Roman";}
@list l0:level2
        {mso-level-number-format:bullet;
        mso-level-text:o;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:54.0pt;
        text-indent:-18.0pt;
        font-family:"Courier New";}
@list l0:level3
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:90.0pt;
        text-indent:-18.0pt;
        font-family:Wingdings;}
@list l0:level4
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:126.0pt;
        text-indent:-18.0pt;
        font-family:Symbol;}
@list l0:level5
        {mso-level-number-format:bullet;
        mso-level-text:o;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:162.0pt;
        text-indent:-18.0pt;
        font-family:"Courier New";}
@list l0:level6
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:198.0pt;
        text-indent:-18.0pt;
        font-family:Wingdings;}
@list l0:level7
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:234.0pt;
        text-indent:-18.0pt;
        font-family:Symbol;}
@list l0:level8
        {mso-level-number-format:bullet;
        mso-level-text:o;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:270.0pt;
        text-indent:-18.0pt;
        font-family:"Courier New";}
@list l0:level9
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:306.0pt;
        text-indent:-18.0pt;
        font-family:Wingdings;}
ol
        {margin-bottom:0cm;}
ul
        {margin-bottom:0cm;}
--></style>
        <div class="WordSection1"><span style="font-size:11.0pt">Proposed
            change:<o:p></o:p></span>
          <p class="MsoNormal"><span style="font-size:11.0pt">----------------------------<o:p></o:p></span></p>
          <p class="MsoNormal"><span style="font-size:11.0pt">In this
              RFC I propose changing the intrinsics for
              llvm.experimental.vector.reduce.fadd and
              llvm.experimental.vector.reduce.fmul (see options A and
              B). I also propose renaming the 'accumulator' operand to
              'start value' because for fmul this is the start value of
              the reduction, rather than a value to which the fmul
              reduction is accumulated into.<o:p></o:p></span></p>
          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
          <p class="MsoNormal"><span style="font-size:11.0pt">[Option A]
              Always using the start value operand in the reduction (<a
                href="https://reviews.llvm.org/D60261"
                moz-do-not-send="true">https://reviews.llvm.org/D60261</a>)<o:p></o:p></span></p>
          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
          <p class="MsoNormal"><span style="font-size:11.0pt">  declare
              float
              @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float
              %start_value, <4 x float> %vec)<o:p></o:p></span></p>
          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
          <p class="MsoNormal"><span style="font-size:11.0pt">This means
              that if the start value is 'undef', the result will be
              undef and all code creating such a reduction will need to
              ensure it has a sensible start value (e.g. 0.0 for fadd,
              1.0 for fmul). When using 'fast' or ‘reassoc’ on the call
              it will be implemented using an unordered reduction,
              otherwise it will be implemented with an ordered
              reduction. Note that a new intrinsic is required to
              capture the new semantics. In this proposal the intrinsic
              is prefixed with a 'v2' for the time being, with the
              expectation this will be dropped when we remove
              'experimental' from the reduction intrinsics in the
              future.</span><span
              style="font-size:11.0pt;font-family:"MS Gothic""><o:p></o:p></span></p>
          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
          <p class="MsoNormal"><span style="font-size:11.0pt">[Option B]
              Having separate ordered and unordered intrinsics (<a
                href="https://reviews.llvm.org/D60262"
                moz-do-not-send="true">https://reviews.llvm.org/D60262</a>).<o:p></o:p></span></p>
          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
          <p class="MsoNormal"><span style="font-size:11.0pt">  declare
              float
              @llvm.experimental.vector.reduce.ordered.fadd.f32.v4f32(float
              %start_value, <4 x float> %vec)<o:p></o:p></span></p>
          <p class="MsoNormal"><span style="font-size:11.0pt">  declare
              float
              @llvm.experimental.vector.reduce.unordered.fadd.f32.v4f32(<4
              x float> %vec)<o:p></o:p></span></p>
          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
          <p class="MsoNormal"><span style="font-size:11.0pt">This will
              mean that the behaviour is explicit from the intrinsic and
              the use of 'fast' or ‘reassoc’ on the call has no effect
              on how that intrinsic is lowered. The ordered reduction
              intrinsic will take a scalar start-value operand, where
              the unordered reduction intrinsic will only take a vector
              operand.<o:p></o:p></span></p>
          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
          <p class="MsoNormal"><span style="font-size:11.0pt">Both
              options auto-upgrade the IR to use the new (version of
              the) intrinsics. I'm personally slightly in favour of
              [Option B], because it better aligns with the definition
              of the SelectionDAG nodes and is more explicit in its
              semantics. We also avoid having to use an artificial 'v2'
              like prefix to denote the new behaviour of the intrinsic.<o:p></o:p></span></p>
          <span style="font-size:11.0pt"><o:p> </o:p></span></div>
      </blockquote>
      <p>Do we have any targets with instructions that can actually use
        the start value? TBH I'd be tempted to suggest we just make the
        initial extractelement/fadd/insertelement pattern a manual extra
        stage and avoid having having that argument entirely. <br>
      </p>
      <blockquote type="cite"
        cite="mid:67D8F282-9E37-473F-9973-AA981D992711@arm.com">
        <div class="WordSection1">
          <p class="MsoNormal"><span style="font-size:11.0pt">Further
              efforts:<o:p></o:p></span></p>
          <p class="MsoNormal"><span style="font-size:11.0pt">----------------------------<o:p></o:p></span></p>
          <p class="MsoNormal"><span style="font-size:11.0pt">Here a
              non-exhaustive list of items I think work towards making
              the intrinsics non-experimental:</span><span
              style="font-size:11.0pt;font-family:"MS Gothic""
              lang="EN-US">
</span><span style="font-size:11.0pt"><o:p></o:p></span></p>
          <ul style="margin-top:0cm" type="disc">
            <li class="MsoListParagraph"
              style="margin-left:-18.0pt;mso-list:l0 level1 lfo1"> <span
                style="font-size:11.0pt">Adding SelectionDAG
                legalization for the  _STRICT reduction SDNodes. After
                some great work from Nikita in D58015, unordered
                reductions are now legalized/expanded in SelectionDAG,
                so if we add expansion in SelectionDAG for strict
                reductions this would make the ExpandReductionsPass
                redundant.<o:p></o:p></span></li>
            <li class="MsoListParagraph"
              style="margin-left:-18.0pt;mso-list:l0 level1 lfo1"> <span
                style="font-size:11.0pt">Better enforcing the
                constraints of the intrinsics (see <a
                  href="https://reviews.llvm.org/D60260"
                  moz-do-not-send="true">https://reviews.llvm.org/D60260</a>
                ).</span><span
                style="font-size:11.0pt;font-family:"MS
                Gothic"" lang="EN-US">
</span><span
                style="font-size:11.0pt"><o:p></o:p></span></li>
            <li class="MsoListParagraph"
              style="margin-left:-18.0pt;mso-list:l0 level1 lfo1"> <span
                style="font-size:11.0pt">I think we'll also want to be
                able to overload the result operand based on the vector
                element type for the intrinsics having the constraint
                that the result type must match the vector element type.
                e.g. dropping the redundant 'i32' in:</span><span
                style="font-size:11.0pt;font-family:"MS
                Gothic""><br>
                  </span><span style="font-size:11.0pt">i32
                @llvm.experimental.vector.reduce.and.i32.v4i32(<4 x
                i32> %a) => i32
                @llvm.experimental.vector.reduce.and.v4i32(<4 x
                i32> %a)<o:p></o:p></span></li>
          </ul>
          <p class="MsoListParagraph" style="margin-left:18.0pt"><span
              style="font-size:11.0pt">since i32 is implied by <4 x
              i32>. This would have the added benefit that LLVM would
              automatically check for the operands to match.</span><span
              style="font-size:11.0pt;font-family:"MS Gothic""
              lang="EN-US">
</span></p>
        </div>
      </blockquote>
      <p>Won't this cause issues with overflow? Isn't the point  of an
        add (or mul....) reduction of say, <64 x i8> giving a
        larger (i32 or i64) result so we don't lose anything? I agree
        for bitop reductions it doesn't make sense though.<br>
      </p>
    </blockquote>
    Sorry - I forgot to add: which asks the question - should we be
    considering signed/unsigned add/mul and possibly saturation
    reductions?<br>
  </body>
</html>