<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <div class="moz-cite-prefix">On 04/04/2019 14:11, Sander De Smalen

      wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:67D8F282-9E37-473F-9973-AA981D992711@arm.com">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <meta name="Generator" content="Microsoft Word 15 (filtered

        medium)">

      <style><!--

/* Font Definitions */

@font-face

        {font-family:Wingdings;

        panose-1:5 0 0 0 0 0 0 0 0 0;}

@font-face

        {font-family:"MS Gothic";

        panose-1:2 11 6 9 7 2 5 8 2 4;}

@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

@font-face

        {font-family:"\@MS Gothic";

        panose-1:2 11 6 9 7 2 5 8 2 4;}

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0cm;

        margin-bottom:.0001pt;

        font-size:12.0pt;

        font-family:"Calibri",sans-serif;}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:#0563C1;

        text-decoration:underline;}

a:visited, span.MsoHyperlinkFollowed

        {mso-style-priority:99;

        color:#954F72;

        text-decoration:underline;}

p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph

        {mso-style-priority:34;

        margin-top:0cm;

        margin-right:0cm;

        margin-bottom:0cm;

        margin-left:36.0pt;

        margin-bottom:.0001pt;

        font-size:12.0pt;

        font-family:"Calibri",sans-serif;}

span.EmailStyle17

        {mso-style-type:personal-compose;

        font-family:"Calibri",sans-serif;

        color:windowtext;}

.MsoChpDefault

        {mso-style-type:export-only;}

@page WordSection1

        {size:612.0pt 792.0pt;

        margin:72.0pt 72.0pt 72.0pt 72.0pt;}

div.WordSection1

        {page:WordSection1;}

/* List Definitions */

@list l0

        {mso-list-id:2092389295;

        mso-list-type:hybrid;

        mso-list-template-ids:1156977324 1390607260 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}

@list l0:level1

        {mso-level-start-at:2;

        mso-level-number-format:bullet;

        mso-level-text:;

        mso-level-tab-stop:none;

        mso-level-number-position:left;

        margin-left:18.0pt;

        text-indent:-18.0pt;

        font-family:Symbol;

        mso-fareast-font-family:Calibri;

        mso-bidi-font-family:"Times New Roman";}

@list l0:level2

        {mso-level-number-format:bullet;

        mso-level-text:o;

        mso-level-tab-stop:none;

        mso-level-number-position:left;

        margin-left:54.0pt;

        text-indent:-18.0pt;

        font-family:"Courier New";}

@list l0:level3

        {mso-level-number-format:bullet;

        mso-level-text:;

        mso-level-tab-stop:none;

        mso-level-number-position:left;

        margin-left:90.0pt;

        text-indent:-18.0pt;

        font-family:Wingdings;}

@list l0:level4

        {mso-level-number-format:bullet;

        mso-level-text:;

        mso-level-tab-stop:none;

        mso-level-number-position:left;

        margin-left:126.0pt;

        text-indent:-18.0pt;

        font-family:Symbol;}

@list l0:level5

        {mso-level-number-format:bullet;

        mso-level-text:o;

        mso-level-tab-stop:none;

        mso-level-number-position:left;

        margin-left:162.0pt;

        text-indent:-18.0pt;

        font-family:"Courier New";}

@list l0:level6

        {mso-level-number-format:bullet;

        mso-level-text:;

        mso-level-tab-stop:none;

        mso-level-number-position:left;

        margin-left:198.0pt;

        text-indent:-18.0pt;

        font-family:Wingdings;}

@list l0:level7

        {mso-level-number-format:bullet;

        mso-level-text:;

        mso-level-tab-stop:none;

        mso-level-number-position:left;

        margin-left:234.0pt;

        text-indent:-18.0pt;

        font-family:Symbol;}

@list l0:level8

        {mso-level-number-format:bullet;

        mso-level-text:o;

        mso-level-tab-stop:none;

        mso-level-number-position:left;

        margin-left:270.0pt;

        text-indent:-18.0pt;

        font-family:"Courier New";}

@list l0:level9

        {mso-level-number-format:bullet;

        mso-level-text:;

        mso-level-tab-stop:none;

        mso-level-number-position:left;

        margin-left:306.0pt;

        text-indent:-18.0pt;

        font-family:Wingdings;}

ol

        {margin-bottom:0cm;}

ul

        {margin-bottom:0cm;}

--></style>

      <div class="WordSection1"><span style="font-size:11.0pt">Proposed

          change:<o:p></o:p></span>

        <p class="MsoNormal"><span style="font-size:11.0pt">----------------------------<o:p></o:p></span></p>

        <p class="MsoNormal"><span style="font-size:11.0pt">In this RFC

            I propose changing the intrinsics for

            llvm.experimental.vector.reduce.fadd and

            llvm.experimental.vector.reduce.fmul (see options A and B).

            I also propose renaming the 'accumulator' operand to 'start

            value' because for fmul this is the start value of the

            reduction, rather than a value to which the fmul reduction

            is accumulated into.<o:p></o:p></span></p>

        <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

        <p class="MsoNormal"><span style="font-size:11.0pt">[Option A]

            Always using the start value operand in the reduction (<a

              href="https://reviews.llvm.org/D60261"

              moz-do-not-send="true">https://reviews.llvm.org/D60261</a>)<o:p></o:p></span></p>

        <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

        <p class="MsoNormal"><span style="font-size:11.0pt">  declare

            float

            @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float

            %start_value, <4 x float> %vec)<o:p></o:p></span></p>

        <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

        <p class="MsoNormal"><span style="font-size:11.0pt">This means

            that if the start value is 'undef', the result will be undef

            and all code creating such a reduction will need to ensure

            it has a sensible start value (e.g. 0.0 for fadd, 1.0 for

            fmul). When using 'fast' or ‘reassoc’ on the call it will be

            implemented using an unordered reduction, otherwise it will

            be implemented with an ordered reduction. Note that a new

            intrinsic is required to capture the new semantics. In this

            proposal the intrinsic is prefixed with a 'v2' for the time

            being, with the expectation this will be dropped when we

            remove 'experimental' from the reduction intrinsics in the

            future.</span><span

            style="font-size:11.0pt;font-family:"MS Gothic""><o:p></o:p></span></p>

        <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

        <p class="MsoNormal"><span style="font-size:11.0pt">[Option B]

            Having separate ordered and unordered intrinsics (<a

              href="https://reviews.llvm.org/D60262"

              moz-do-not-send="true">https://reviews.llvm.org/D60262</a>).<o:p></o:p></span></p>

        <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

        <p class="MsoNormal"><span style="font-size:11.0pt">  declare

            float

            @llvm.experimental.vector.reduce.ordered.fadd.f32.v4f32(float

            %start_value, <4 x float> %vec)<o:p></o:p></span></p>

        <p class="MsoNormal"><span style="font-size:11.0pt">  declare

            float

            @llvm.experimental.vector.reduce.unordered.fadd.f32.v4f32(<4

            x float> %vec)<o:p></o:p></span></p>

        <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

        <p class="MsoNormal"><span style="font-size:11.0pt">This will

            mean that the behaviour is explicit from the intrinsic and

            the use of 'fast' or ‘reassoc’ on the call has no effect on

            how that intrinsic is lowered. The ordered reduction

            intrinsic will take a scalar start-value operand, where the

            unordered reduction intrinsic will only take a vector

            operand.<o:p></o:p></span></p>

        <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

        <p class="MsoNormal"><span style="font-size:11.0pt">Both options

            auto-upgrade the IR to use the new (version of the)

            intrinsics. I'm personally slightly in favour of [Option B],

            because it better aligns with the definition of the

            SelectionDAG nodes and is more explicit in its semantics. We

            also avoid having to use an artificial 'v2' like prefix to

            denote the new behaviour of the intrinsic.<o:p></o:p></span></p>

        <span style="font-size:11.0pt"><o:p> </o:p></span></div>

    </blockquote>

    <p>Do we have any targets with instructions that can actually use

      the start value? TBH I'd be tempted to suggest we just make the

      initial extractelement/fadd/insertelement pattern a manual extra

      stage and avoid having having that argument entirely. <br>

    </p>

    <blockquote type="cite"

      cite="mid:67D8F282-9E37-473F-9973-AA981D992711@arm.com">

      <div class="WordSection1">

        <p class="MsoNormal"><span style="font-size:11.0pt">Further

            efforts:<o:p></o:p></span></p>

        <p class="MsoNormal"><span style="font-size:11.0pt">----------------------------<o:p></o:p></span></p>

        <p class="MsoNormal"><span style="font-size:11.0pt">Here a

            non-exhaustive list of items I think work towards making the

            intrinsics non-experimental:</span><span

            style="font-size:11.0pt;font-family:"MS Gothic""

            lang="EN-US"> </span><span style="font-size:11.0pt"><o:p></o:p></span></p>

        <ul style="margin-top:0cm" type="disc">

          <li class="MsoListParagraph"

            style="margin-left:-18.0pt;mso-list:l0 level1 lfo1">

            <span style="font-size:11.0pt">Adding SelectionDAG

              legalization for the  _STRICT reduction SDNodes. After

              some great work from Nikita in D58015, unordered

              reductions are now legalized/expanded in SelectionDAG, so

              if we add expansion in SelectionDAG for strict reductions

              this would make the ExpandReductionsPass redundant.<o:p></o:p></span></li>

          <li class="MsoListParagraph"

            style="margin-left:-18.0pt;mso-list:l0 level1 lfo1">

            <span style="font-size:11.0pt">Better enforcing the

              constraints of the intrinsics (see

              <a href="https://reviews.llvm.org/D60260"

                moz-do-not-send="true">https://reviews.llvm.org/D60260</a>

              ).</span><span

              style="font-size:11.0pt;font-family:"MS Gothic""

              lang="EN-US"> </span><span style="font-size:11.0pt"><o:p></o:p></span></li>

          <li class="MsoListParagraph"

            style="margin-left:-18.0pt;mso-list:l0 level1 lfo1">

            <span style="font-size:11.0pt">I think we'll also want to be

              able to overload the result operand based on the vector

              element type for the intrinsics having the constraint that

              the result type must match the vector element type. e.g.

              dropping the redundant 'i32' in:</span><span

              style="font-size:11.0pt;font-family:"MS Gothic""><br>

                </span><span style="font-size:11.0pt">i32

              @llvm.experimental.vector.reduce.and.i32.v4i32(<4 x

              i32> %a) => i32

              @llvm.experimental.vector.reduce.and.v4i32(<4 x i32>

              %a)<o:p></o:p></span></li>

        </ul>

        <p class="MsoListParagraph" style="margin-left:18.0pt"><span

            style="font-size:11.0pt">since i32 is implied by <4 x

            i32>. This would have the added benefit that LLVM would

            automatically check for the operands to match.</span><span

            style="font-size:11.0pt;font-family:"MS Gothic""

            lang="EN-US"> </span></p>

      </div>

    </blockquote>

    <p>Won't this cause issues with overflow? Isn't the point  of an add

      (or mul....) reduction of say, <64 x i8> giving a larger

      (i32 or i64) result so we don't lose anything? I agree for bitop

      reductions it doesn't make sense though.<br>

    </p>

    <p>Simon.<br>

    </p>

  </body>

</html>