<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">Hi,</div>
    <div class="moz-cite-prefix"><br>
    </div>
    <div class="moz-cite-prefix">On 4/5/19 10:47 AM, Simon Pilgrim via
      llvm-dev wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:b7aae9ec-a423-5d43-9990-6b353feb153b@redking.me.uk">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <div class="moz-cite-prefix">On 05/04/2019 09:37, Simon Pilgrim
        via llvm-dev wrote:<br>
      </div>
      <blockquote type="cite"
        cite="mid:d306cf98-1225-732d-8016-7e882b5136b1@redking.me.uk">
        <meta http-equiv="Content-Type" content="text/html;
          charset=UTF-8">
        <div class="moz-cite-prefix">On 04/04/2019 14:11, Sander De
          Smalen wrote:<br>
        </div>
        <blockquote type="cite"
          cite="mid:67D8F282-9E37-473F-9973-AA981D992711@arm.com">
          <meta http-equiv="Content-Type" content="text/html;
            charset=UTF-8">
          <meta name="Generator" content="Microsoft Word 15 (filtered
            medium)">
          <style><!--
/* Font Definitions */
@font-face
        {font-family:Wingdings;
        panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
        {font-family:"MS Gothic";
        panose-1:2 11 6 9 7 2 5 8 2 4;}
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:"\@MS Gothic";
        panose-1:2 11 6 9 7 2 5 8 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:#0563C1;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:#954F72;
        text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
        {mso-style-priority:34;
        margin-top:0cm;
        margin-right:0cm;
        margin-bottom:0cm;
        margin-left:36.0pt;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Calibri",sans-serif;}
span.EmailStyle17
        {mso-style-type:personal-compose;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
/* List Definitions */
@list l0
        {mso-list-id:2092389295;
        mso-list-type:hybrid;
        mso-list-template-ids:1156977324 1390607260 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l0:level1
        {mso-level-start-at:2;
        mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:18.0pt;
        text-indent:-18.0pt;
        font-family:Symbol;
        mso-fareast-font-family:Calibri;
        mso-bidi-font-family:"Times New Roman";}
@list l0:level2
        {mso-level-number-format:bullet;
        mso-level-text:o;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:54.0pt;
        text-indent:-18.0pt;
        font-family:"Courier New";}
@list l0:level3
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:90.0pt;
        text-indent:-18.0pt;
        font-family:Wingdings;}
@list l0:level4
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:126.0pt;
        text-indent:-18.0pt;
        font-family:Symbol;}
@list l0:level5
        {mso-level-number-format:bullet;
        mso-level-text:o;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:162.0pt;
        text-indent:-18.0pt;
        font-family:"Courier New";}
@list l0:level6
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:198.0pt;
        text-indent:-18.0pt;
        font-family:Wingdings;}
@list l0:level7
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:234.0pt;
        text-indent:-18.0pt;
        font-family:Symbol;}
@list l0:level8
        {mso-level-number-format:bullet;
        mso-level-text:o;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:270.0pt;
        text-indent:-18.0pt;
        font-family:"Courier New";}
@list l0:level9
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        margin-left:306.0pt;
        text-indent:-18.0pt;
        font-family:Wingdings;}
ol
        {margin-bottom:0cm;}
ul
        {margin-bottom:0cm;}
--></style>
          <div class="WordSection1"><span style="font-size:11.0pt">Proposed
              change:<o:p></o:p></span>
            <p class="MsoNormal"><span style="font-size:11.0pt">----------------------------<o:p></o:p></span></p>
            <p class="MsoNormal"><span style="font-size:11.0pt">In this
                RFC I propose changing the intrinsics for
                llvm.experimental.vector.reduce.fadd and
                llvm.experimental.vector.reduce.fmul (see options A and
                B). I also propose renaming the 'accumulator' operand to
                'start value' because for fmul this is the start value
                of the reduction, rather than a value to which the fmul
                reduction is accumulated into.</span></p>
          </div>
        </blockquote>
      </blockquote>
    </blockquote>
    <p>Note that the LLVM-VP proposal also changes the way reductions
      are handled in IR (<a class="moz-txt-link-freetext" href="https://reviews.llvm.org/D57504">https://reviews.llvm.org/D57504</a>). This could be
      an opportunity to avoid the "v2" suffix issue: LLVM-VP moves the
      intrinsic to the "llvm.vp.*" namespace and we can fix the
      reduction semantics in the progress.</p>
    <p>Btw, if you are at EuroLLVM. There is a BoF at 2pm today on
      LLVM-VP.<br>
    </p>
    <blockquote type="cite"
      cite="mid:b7aae9ec-a423-5d43-9990-6b353feb153b@redking.me.uk">
      <blockquote type="cite"
        cite="mid:d306cf98-1225-732d-8016-7e882b5136b1@redking.me.uk">
        <blockquote type="cite"
          cite="mid:67D8F282-9E37-473F-9973-AA981D992711@arm.com">
          <div class="WordSection1">
            <p class="MsoNormal"><span style="font-size:11.0pt"><o:p></o:p></span></p>
            <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
            <p class="MsoNormal"><span style="font-size:11.0pt">[Option
                A] Always using the start value operand in the reduction
                (<a href="https://reviews.llvm.org/D60261"
                  moz-do-not-send="true">https://reviews.llvm.org/D60261</a>)<o:p></o:p></span></p>
            <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
            <p class="MsoNormal"><span style="font-size:11.0pt"> 
                declare float
                @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float
                %start_value, <4 x float> %vec)<o:p></o:p></span></p>
            <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
            <p class="MsoNormal"><span style="font-size:11.0pt">This
                means that if the start value is 'undef', the result
                will be undef and all code creating such a reduction
                will need to ensure it has a sensible start value (e.g.
                0.0 for fadd, 1.0 for fmul). When using 'fast' or
                ‘reassoc’ on the call it will be implemented using an
                unordered reduction, otherwise it will be implemented
                with an ordered reduction. Note that a new intrinsic is
                required to capture the new semantics. In this proposal
                the intrinsic is prefixed with a 'v2' for the time
                being, with the expectation this will be dropped when we
                remove 'experimental' from the reduction intrinsics in
                the future.</span><span
                style="font-size:11.0pt;font-family:"MS
                Gothic""><o:p></o:p></span></p>
            <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
            <p class="MsoNormal"><span style="font-size:11.0pt">[Option
                B] Having separate ordered and unordered intrinsics (<a
                  href="https://reviews.llvm.org/D60262"
                  moz-do-not-send="true">https://reviews.llvm.org/D60262</a>).<o:p></o:p></span></p>
            <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
            <p class="MsoNormal"><span style="font-size:11.0pt"> 
                declare float
                @llvm.experimental.vector.reduce.ordered.fadd.f32.v4f32(float
                %start_value, <4 x float> %vec)<o:p></o:p></span></p>
            <p class="MsoNormal"><span style="font-size:11.0pt"> 
                declare float
                @llvm.experimental.vector.reduce.unordered.fadd.f32.v4f32(<4
                x float> %vec)<o:p></o:p></span></p>
            <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
            <p class="MsoNormal"><span style="font-size:11.0pt">This
                will mean that the behaviour is explicit from the
                intrinsic and the use of 'fast' or ‘reassoc’ on the call
                has no effect on how that intrinsic is lowered. The
                ordered reduction intrinsic will take a scalar
                start-value operand, where the unordered reduction
                intrinsic will only take a vector operand.<o:p></o:p></span></p>
            <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
            <p class="MsoNormal"><span style="font-size:11.0pt">Both
                options auto-upgrade the IR to use the new (version of
                the) intrinsics. I'm personally slightly in favour of
                [Option B], because it better aligns with the definition
                of the SelectionDAG nodes and is more explicit in its
                semantics. We also avoid having to use an artificial
                'v2' like prefix to denote the new behaviour of the
                intrinsic.<o:p></o:p></span></p>
            <span style="font-size:11.0pt"><o:p> </o:p></span></div>
        </blockquote>
        <p>Do we have any targets with instructions that can actually
          use the start value? TBH I'd be tempted to suggest we just
          make the initial extractelement/fadd/insertelement pattern a
          manual extra stage and avoid having having that argument
          entirely. <br>
        </p>
      </blockquote>
    </blockquote>
    NEC SX-Aurora has reduction instructions that take in a start value
    in a scalar register. We are hoping to upstream the backend:
    <a class="moz-txt-link-freetext" href="http://lists.llvm.org/pipermail/llvm-dev/2019-April/131580.html">http://lists.llvm.org/pipermail/llvm-dev/2019-April/131580.html</a><br>
    <blockquote type="cite"
      cite="mid:b7aae9ec-a423-5d43-9990-6b353feb153b@redking.me.uk">
      <blockquote type="cite"
        cite="mid:d306cf98-1225-732d-8016-7e882b5136b1@redking.me.uk">
        <p> </p>
        <blockquote type="cite"
          cite="mid:67D8F282-9E37-473F-9973-AA981D992711@arm.com">
          <div class="WordSection1">
            <p class="MsoNormal"><span style="font-size:11.0pt">Further
                efforts:<o:p></o:p></span></p>
            <p class="MsoNormal"><span style="font-size:11.0pt">----------------------------<o:p></o:p></span></p>
            <p class="MsoNormal"><span style="font-size:11.0pt">Here a
                non-exhaustive list of items I think work towards making
                the intrinsics non-experimental:</span><span
                style="font-size:11.0pt;font-family:"MS
                Gothic"" lang="EN-US">
</span><span
                style="font-size:11.0pt"><o:p></o:p></span></p>
            <ul style="margin-top:0cm" type="disc">
              <li class="MsoListParagraph"
                style="margin-left:-18.0pt;mso-list:l0 level1 lfo1"> <span
                  style="font-size:11.0pt">Adding SelectionDAG
                  legalization for the  _STRICT reduction SDNodes. After
                  some great work from Nikita in D58015, unordered
                  reductions are now legalized/expanded in SelectionDAG,
                  so if we add expansion in SelectionDAG for strict
                  reductions this would make the ExpandReductionsPass
                  redundant.<o:p></o:p></span></li>
              <li class="MsoListParagraph"
                style="margin-left:-18.0pt;mso-list:l0 level1 lfo1"> <span
                  style="font-size:11.0pt">Better enforcing the
                  constraints of the intrinsics (see <a
                    href="https://reviews.llvm.org/D60260"
                    moz-do-not-send="true">https://reviews.llvm.org/D60260</a>
                  ).</span><span
                  style="font-size:11.0pt;font-family:"MS
                  Gothic"" lang="EN-US">
</span><span
                  style="font-size:11.0pt"><o:p></o:p></span></li>
              <li class="MsoListParagraph"
                style="margin-left:-18.0pt;mso-list:l0 level1 lfo1"> <span
                  style="font-size:11.0pt">I think we'll also want to be
                  able to overload the result operand based on the
                  vector element type for the intrinsics having the
                  constraint that the result type must match the vector
                  element type. e.g. dropping the redundant 'i32' in:</span><span
                  style="font-size:11.0pt;font-family:"MS
                  Gothic""><br>
                    </span><span style="font-size:11.0pt">i32
                  @llvm.experimental.vector.reduce.and.i32.v4i32(<4 x
                  i32> %a) => i32
                  @llvm.experimental.vector.reduce.and.v4i32(<4 x
                  i32> %a)<o:p></o:p></span></li>
            </ul>
            <p class="MsoListParagraph" style="margin-left:18.0pt"><span
                style="font-size:11.0pt">since i32 is implied by <4 x
                i32>. This would have the added benefit that LLVM
                would automatically check for the operands to match.</span><span
                style="font-size:11.0pt;font-family:"MS
                Gothic"" lang="EN-US">
</span></p>
          </div>
        </blockquote>
        <p>Won't this cause issues with overflow? Isn't the point  of an
          add (or mul....) reduction of say, <64 x i8> giving a
          larger (i32 or i64) result so we don't lose anything? I agree
          for bitop reductions it doesn't make sense though.<br>
        </p>
      </blockquote>
      Sorry - I forgot to add: which asks the question - should we be
      considering signed/unsigned add/mul and possibly saturation
      reductions?<br>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
    </blockquote>
    <pre class="moz-signature" cols="72">-- 

Simon Moll
Researcher / PhD Student

Compiler Design Lab (Prof. Hack)
Saarland University, Computer Science
Building E1.3, Room 4.31

Tel. +49 (0)681 302-57521 : <a class="moz-txt-link-abbreviated" href="mailto:moll@cs.uni-saarland.de">moll@cs.uni-saarland.de</a>
Fax. +49 (0)681 302-3065  : <a class="moz-txt-link-freetext" href="http://compilers.cs.uni-saarland.de/people/moll">http://compilers.cs.uni-saarland.de/people/moll</a></pre>
  </body>
</html>