<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 04/04/2019 14:11, Sander De Smalen
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:67D8F282-9E37-473F-9973-AA981D992711@arm.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"MS Gothic";
panose-1:2 11 6 9 7 2 5 8 2 4;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"\@MS Gothic";
panose-1:2 11 6 9 7 2 5 8 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0cm;
margin-right:0cm;
margin-bottom:0cm;
margin-left:36.0pt;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:2092389295;
mso-list-type:hybrid;
mso-list-template-ids:1156977324 1390607260 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l0:level1
{mso-level-start-at:2;
mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:18.0pt;
text-indent:-18.0pt;
font-family:Symbol;
mso-fareast-font-family:Calibri;
mso-bidi-font-family:"Times New Roman";}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:54.0pt;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l0:level3
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:90.0pt;
text-indent:-18.0pt;
font-family:Wingdings;}
@list l0:level4
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:126.0pt;
text-indent:-18.0pt;
font-family:Symbol;}
@list l0:level5
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:162.0pt;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l0:level6
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:198.0pt;
text-indent:-18.0pt;
font-family:Wingdings;}
@list l0:level7
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:234.0pt;
text-indent:-18.0pt;
font-family:Symbol;}
@list l0:level8
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:270.0pt;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l0:level9
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:306.0pt;
text-indent:-18.0pt;
font-family:Wingdings;}
ol
{margin-bottom:0cm;}
ul
{margin-bottom:0cm;}
--></style>
<div class="WordSection1"><span style="font-size:11.0pt">Proposed
change:<o:p></o:p></span>
<p class="MsoNormal"><span style="font-size:11.0pt">----------------------------<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">In this RFC
I propose changing the intrinsics for
llvm.experimental.vector.reduce.fadd and
llvm.experimental.vector.reduce.fmul (see options A and B).
I also propose renaming the 'accumulator' operand to 'start
value' because for fmul this is the start value of the
reduction, rather than a value to which the fmul reduction
is accumulated into.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">[Option A]
Always using the start value operand in the reduction (<a
href="https://reviews.llvm.org/D60261"
moz-do-not-send="true">https://reviews.llvm.org/D60261</a>)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> declare
float
@llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float
%start_value, <4 x float> %vec)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">This means
that if the start value is 'undef', the result will be undef
and all code creating such a reduction will need to ensure
it has a sensible start value (e.g. 0.0 for fadd, 1.0 for
fmul). When using 'fast' or ‘reassoc’ on the call it will be
implemented using an unordered reduction, otherwise it will
be implemented with an ordered reduction. Note that a new
intrinsic is required to capture the new semantics. In this
proposal the intrinsic is prefixed with a 'v2' for the time
being, with the expectation this will be dropped when we
remove 'experimental' from the reduction intrinsics in the
future.</span><span
style="font-size:11.0pt;font-family:"MS Gothic""><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">[Option B]
Having separate ordered and unordered intrinsics (<a
href="https://reviews.llvm.org/D60262"
moz-do-not-send="true">https://reviews.llvm.org/D60262</a>).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> declare
float
@llvm.experimental.vector.reduce.ordered.fadd.f32.v4f32(float
%start_value, <4 x float> %vec)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> declare
float
@llvm.experimental.vector.reduce.unordered.fadd.f32.v4f32(<4
x float> %vec)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">This will
mean that the behaviour is explicit from the intrinsic and
the use of 'fast' or ‘reassoc’ on the call has no effect on
how that intrinsic is lowered. The ordered reduction
intrinsic will take a scalar start-value operand, where the
unordered reduction intrinsic will only take a vector
operand.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Both options
auto-upgrade the IR to use the new (version of the)
intrinsics. I'm personally slightly in favour of [Option B],
because it better aligns with the definition of the
SelectionDAG nodes and is more explicit in its semantics. We
also avoid having to use an artificial 'v2' like prefix to
denote the new behaviour of the intrinsic.<o:p></o:p></span></p>
<span style="font-size:11.0pt"><o:p> </o:p></span></div>
</blockquote>
<p>Do we have any targets with instructions that can actually use
the start value? TBH I'd be tempted to suggest we just make the
initial extractelement/fadd/insertelement pattern a manual extra
stage and avoid having having that argument entirely. <br>
</p>
<blockquote type="cite"
cite="mid:67D8F282-9E37-473F-9973-AA981D992711@arm.com">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">Further
efforts:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">----------------------------<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Here a
non-exhaustive list of items I think work towards making the
intrinsics non-experimental:</span><span
style="font-size:11.0pt;font-family:"MS Gothic""
lang="EN-US">
</span><span style="font-size:11.0pt"><o:p></o:p></span></p>
<ul style="margin-top:0cm" type="disc">
<li class="MsoListParagraph"
style="margin-left:-18.0pt;mso-list:l0 level1 lfo1">
<span style="font-size:11.0pt">Adding SelectionDAG
legalization for the _STRICT reduction SDNodes. After
some great work from Nikita in D58015, unordered
reductions are now legalized/expanded in SelectionDAG, so
if we add expansion in SelectionDAG for strict reductions
this would make the ExpandReductionsPass redundant.<o:p></o:p></span></li>
<li class="MsoListParagraph"
style="margin-left:-18.0pt;mso-list:l0 level1 lfo1">
<span style="font-size:11.0pt">Better enforcing the
constraints of the intrinsics (see
<a href="https://reviews.llvm.org/D60260"
moz-do-not-send="true">https://reviews.llvm.org/D60260</a>
).</span><span
style="font-size:11.0pt;font-family:"MS Gothic""
lang="EN-US">
</span><span style="font-size:11.0pt"><o:p></o:p></span></li>
<li class="MsoListParagraph"
style="margin-left:-18.0pt;mso-list:l0 level1 lfo1">
<span style="font-size:11.0pt">I think we'll also want to be
able to overload the result operand based on the vector
element type for the intrinsics having the constraint that
the result type must match the vector element type. e.g.
dropping the redundant 'i32' in:</span><span
style="font-size:11.0pt;font-family:"MS Gothic""><br>
</span><span style="font-size:11.0pt">i32
@llvm.experimental.vector.reduce.and.i32.v4i32(<4 x
i32> %a) => i32
@llvm.experimental.vector.reduce.and.v4i32(<4 x i32>
%a)<o:p></o:p></span></li>
</ul>
<p class="MsoListParagraph" style="margin-left:18.0pt"><span
style="font-size:11.0pt">since i32 is implied by <4 x
i32>. This would have the added benefit that LLVM would
automatically check for the operands to match.</span><span
style="font-size:11.0pt;font-family:"MS Gothic""
lang="EN-US">
</span></p>
</div>
</blockquote>
<p>Won't this cause issues with overflow? Isn't the point of an add
(or mul....) reduction of say, <64 x i8> giving a larger
(i32 or i64) result so we don't lose anything? I agree for bitop
reductions it doesn't make sense though.<br>
</p>
<p>Simon.<br>
</p>
</body>
</html>