<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Hi,</div>
<div class="moz-cite-prefix"><br>
</div>
<div class="moz-cite-prefix">On 4/5/19 10:47 AM, Simon Pilgrim via
llvm-dev wrote:<br>
</div>
<blockquote type="cite"
cite="mid:b7aae9ec-a423-5d43-9990-6b353feb153b@redking.me.uk">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<div class="moz-cite-prefix">On 05/04/2019 09:37, Simon Pilgrim
via llvm-dev wrote:<br>
</div>
<blockquote type="cite"
cite="mid:d306cf98-1225-732d-8016-7e882b5136b1@redking.me.uk">
<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8">
<div class="moz-cite-prefix">On 04/04/2019 14:11, Sander De
Smalen wrote:<br>
</div>
<blockquote type="cite"
cite="mid:67D8F282-9E37-473F-9973-AA981D992711@arm.com">
<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"MS Gothic";
panose-1:2 11 6 9 7 2 5 8 2 4;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"\@MS Gothic";
panose-1:2 11 6 9 7 2 5 8 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0cm;
margin-right:0cm;
margin-bottom:0cm;
margin-left:36.0pt;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:2092389295;
mso-list-type:hybrid;
mso-list-template-ids:1156977324 1390607260 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l0:level1
{mso-level-start-at:2;
mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:18.0pt;
text-indent:-18.0pt;
font-family:Symbol;
mso-fareast-font-family:Calibri;
mso-bidi-font-family:"Times New Roman";}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:54.0pt;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l0:level3
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:90.0pt;
text-indent:-18.0pt;
font-family:Wingdings;}
@list l0:level4
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:126.0pt;
text-indent:-18.0pt;
font-family:Symbol;}
@list l0:level5
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:162.0pt;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l0:level6
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:198.0pt;
text-indent:-18.0pt;
font-family:Wingdings;}
@list l0:level7
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:234.0pt;
text-indent:-18.0pt;
font-family:Symbol;}
@list l0:level8
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:270.0pt;
text-indent:-18.0pt;
font-family:"Courier New";}
@list l0:level9
{mso-level-number-format:bullet;
mso-level-text:;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:306.0pt;
text-indent:-18.0pt;
font-family:Wingdings;}
ol
{margin-bottom:0cm;}
ul
{margin-bottom:0cm;}
--></style>
<div class="WordSection1"><span style="font-size:11.0pt">Proposed
change:<o:p></o:p></span>
<p class="MsoNormal"><span style="font-size:11.0pt">----------------------------<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">In this
RFC I propose changing the intrinsics for
llvm.experimental.vector.reduce.fadd and
llvm.experimental.vector.reduce.fmul (see options A and
B). I also propose renaming the 'accumulator' operand to
'start value' because for fmul this is the start value
of the reduction, rather than a value to which the fmul
reduction is accumulated into.</span></p>
</div>
</blockquote>
</blockquote>
</blockquote>
<p>Note that the LLVM-VP proposal also changes the way reductions
are handled in IR (<a class="moz-txt-link-freetext" href="https://reviews.llvm.org/D57504">https://reviews.llvm.org/D57504</a>). This could be
an opportunity to avoid the "v2" suffix issue: LLVM-VP moves the
intrinsic to the "llvm.vp.*" namespace and we can fix the
reduction semantics in the progress.</p>
<p>Btw, if you are at EuroLLVM. There is a BoF at 2pm today on
LLVM-VP.<br>
</p>
<blockquote type="cite"
cite="mid:b7aae9ec-a423-5d43-9990-6b353feb153b@redking.me.uk">
<blockquote type="cite"
cite="mid:d306cf98-1225-732d-8016-7e882b5136b1@redking.me.uk">
<blockquote type="cite"
cite="mid:67D8F282-9E37-473F-9973-AA981D992711@arm.com">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">[Option
A] Always using the start value operand in the reduction
(<a href="https://reviews.llvm.org/D60261"
moz-do-not-send="true">https://reviews.llvm.org/D60261</a>)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">
declare float
@llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float
%start_value, <4 x float> %vec)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">This
means that if the start value is 'undef', the result
will be undef and all code creating such a reduction
will need to ensure it has a sensible start value (e.g.
0.0 for fadd, 1.0 for fmul). When using 'fast' or
‘reassoc’ on the call it will be implemented using an
unordered reduction, otherwise it will be implemented
with an ordered reduction. Note that a new intrinsic is
required to capture the new semantics. In this proposal
the intrinsic is prefixed with a 'v2' for the time
being, with the expectation this will be dropped when we
remove 'experimental' from the reduction intrinsics in
the future.</span><span
style="font-size:11.0pt;font-family:"MS
Gothic""><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">[Option
B] Having separate ordered and unordered intrinsics (<a
href="https://reviews.llvm.org/D60262"
moz-do-not-send="true">https://reviews.llvm.org/D60262</a>).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">
declare float
@llvm.experimental.vector.reduce.ordered.fadd.f32.v4f32(float
%start_value, <4 x float> %vec)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">
declare float
@llvm.experimental.vector.reduce.unordered.fadd.f32.v4f32(<4
x float> %vec)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">This
will mean that the behaviour is explicit from the
intrinsic and the use of 'fast' or ‘reassoc’ on the call
has no effect on how that intrinsic is lowered. The
ordered reduction intrinsic will take a scalar
start-value operand, where the unordered reduction
intrinsic will only take a vector operand.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Both
options auto-upgrade the IR to use the new (version of
the) intrinsics. I'm personally slightly in favour of
[Option B], because it better aligns with the definition
of the SelectionDAG nodes and is more explicit in its
semantics. We also avoid having to use an artificial
'v2' like prefix to denote the new behaviour of the
intrinsic.<o:p></o:p></span></p>
<span style="font-size:11.0pt"><o:p> </o:p></span></div>
</blockquote>
<p>Do we have any targets with instructions that can actually
use the start value? TBH I'd be tempted to suggest we just
make the initial extractelement/fadd/insertelement pattern a
manual extra stage and avoid having having that argument
entirely. <br>
</p>
</blockquote>
</blockquote>
NEC SX-Aurora has reduction instructions that take in a start value
in a scalar register. We are hoping to upstream the backend:
<a class="moz-txt-link-freetext" href="http://lists.llvm.org/pipermail/llvm-dev/2019-April/131580.html">http://lists.llvm.org/pipermail/llvm-dev/2019-April/131580.html</a><br>
<blockquote type="cite"
cite="mid:b7aae9ec-a423-5d43-9990-6b353feb153b@redking.me.uk">
<blockquote type="cite"
cite="mid:d306cf98-1225-732d-8016-7e882b5136b1@redking.me.uk">
<p> </p>
<blockquote type="cite"
cite="mid:67D8F282-9E37-473F-9973-AA981D992711@arm.com">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">Further
efforts:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">----------------------------<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Here a
non-exhaustive list of items I think work towards making
the intrinsics non-experimental:</span><span
style="font-size:11.0pt;font-family:"MS
Gothic"" lang="EN-US">
</span><span
style="font-size:11.0pt"><o:p></o:p></span></p>
<ul style="margin-top:0cm" type="disc">
<li class="MsoListParagraph"
style="margin-left:-18.0pt;mso-list:l0 level1 lfo1"> <span
style="font-size:11.0pt">Adding SelectionDAG
legalization for the _STRICT reduction SDNodes. After
some great work from Nikita in D58015, unordered
reductions are now legalized/expanded in SelectionDAG,
so if we add expansion in SelectionDAG for strict
reductions this would make the ExpandReductionsPass
redundant.<o:p></o:p></span></li>
<li class="MsoListParagraph"
style="margin-left:-18.0pt;mso-list:l0 level1 lfo1"> <span
style="font-size:11.0pt">Better enforcing the
constraints of the intrinsics (see <a
href="https://reviews.llvm.org/D60260"
moz-do-not-send="true">https://reviews.llvm.org/D60260</a>
).</span><span
style="font-size:11.0pt;font-family:"MS
Gothic"" lang="EN-US">
</span><span
style="font-size:11.0pt"><o:p></o:p></span></li>
<li class="MsoListParagraph"
style="margin-left:-18.0pt;mso-list:l0 level1 lfo1"> <span
style="font-size:11.0pt">I think we'll also want to be
able to overload the result operand based on the
vector element type for the intrinsics having the
constraint that the result type must match the vector
element type. e.g. dropping the redundant 'i32' in:</span><span
style="font-size:11.0pt;font-family:"MS
Gothic""><br>
</span><span style="font-size:11.0pt">i32
@llvm.experimental.vector.reduce.and.i32.v4i32(<4 x
i32> %a) => i32
@llvm.experimental.vector.reduce.and.v4i32(<4 x
i32> %a)<o:p></o:p></span></li>
</ul>
<p class="MsoListParagraph" style="margin-left:18.0pt"><span
style="font-size:11.0pt">since i32 is implied by <4 x
i32>. This would have the added benefit that LLVM
would automatically check for the operands to match.</span><span
style="font-size:11.0pt;font-family:"MS
Gothic"" lang="EN-US">
</span></p>
</div>
</blockquote>
<p>Won't this cause issues with overflow? Isn't the point of an
add (or mul....) reduction of say, <64 x i8> giving a
larger (i32 or i64) result so we don't lose anything? I agree
for bitop reductions it doesn't make sense though.<br>
</p>
</blockquote>
Sorry - I forgot to add: which asks the question - should we be
considering signed/unsigned add/mul and possibly saturation
reductions?<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
</blockquote>
<pre class="moz-signature" cols="72">--
Simon Moll
Researcher / PhD Student
Compiler Design Lab (Prof. Hack)
Saarland University, Computer Science
Building E1.3, Room 4.31
Tel. +49 (0)681 302-57521 : <a class="moz-txt-link-abbreviated" href="mailto:moll@cs.uni-saarland.de">moll@cs.uni-saarland.de</a>
Fax. +49 (0)681 302-3065 : <a class="moz-txt-link-freetext" href="http://compilers.cs.uni-saarland.de/people/moll">http://compilers.cs.uni-saarland.de/people/moll</a></pre>
</body>
</html>