<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">On 05/04/2019 16:26, Sander De Smalen
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:BB39E23A-CC39-4638-97E7-42EDC563E311@arm.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
Hi Simon,
<div class=""><br class="">
</div>
<div class="">Thanks for your feedback! See my comments inline.
<div class=""><br class="">
<div>
<blockquote type="cite" class="">
<div class="">On 5 Apr 2019, at 09:47, Simon Pilgrim via
llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org"
class="" moz-do-not-send="true">llvm-dev@lists.llvm.org</a>>
wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class="moz-cite-prefix" style="caret-color: rgb(0,
0, 0); font-family: Helvetica; font-size: 12px;
font-style: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal;
text-align: start; text-indent: 0px; text-transform:
none; white-space: normal; word-spacing: 0px;
-webkit-text-stroke-width: 0px; background-color:
rgb(255, 255, 255); text-decoration: none;">
<br class="Apple-interchange-newline">
On 05/04/2019 09:37, Simon Pilgrim via llvm-dev wrote:<br
class="">
</div>
<blockquote type="cite"
cite="mid:d306cf98-1225-732d-8016-7e882b5136b1@redking.me.uk"
style="font-family: Helvetica; font-size: 12px;
font-style: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal; orphans:
auto; text-align: start; text-indent: 0px;
text-transform: none; white-space: normal; widows:
auto; word-spacing: 0px; -webkit-text-size-adjust:
auto; -webkit-text-stroke-width: 0px;
background-color: rgb(255, 255, 255); text-decoration:
none;" class="">
<div class="moz-cite-prefix">On 04/04/2019 14:11,
Sander De Smalen wrote:<br class="">
</div>
<blockquote type="cite"
cite="mid:67D8F282-9E37-473F-9973-AA981D992711@arm.com"
class="">
<div class="WordSection1" style="page:
WordSection1;"><span style="font-size: 11pt;"
class="">Proposed change:<o:p class=""></o:p></span>
<div style="margin: 0cm 0cm 0.0001pt; font-size:
12pt; font-family: Calibri, sans-serif;"
class="">
<span style="font-size: 11pt;" class="">----------------------------<o:p
class=""></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size:
12pt; font-family: Calibri, sans-serif;"
class="">
<span style="font-size: 11pt;" class="">In this
RFC I propose changing the intrinsics for
llvm.experimental.vector.reduce.fadd and
llvm.experimental.vector.reduce.fmul (see
options A and B). I also propose renaming the
'accumulator' operand to 'start value' because
for fmul this is the start value of the
reduction, rather than a value to which the
fmul reduction is accumulated into.<o:p
class=""></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size:
12pt; font-family: Calibri, sans-serif;"
class="">
<span style="font-size: 11pt;" class=""><o:p
class=""> </o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size:
12pt; font-family: Calibri, sans-serif;"
class="">
<span style="font-size: 11pt;" class="">[Option
A] Always using the start value operand in the
reduction (<a
href="https://reviews.llvm.org/D60261"
moz-do-not-send="true" style="color:
rgb(149, 79, 114); text-decoration:
underline;" class="">https://reviews.llvm.org/D60261</a>)<o:p
class=""></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size:
12pt; font-family: Calibri, sans-serif;"
class="">
<span style="font-size: 11pt;" class=""><o:p
class=""> </o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size:
12pt; font-family: Calibri, sans-serif;"
class="">
<span style="font-size: 11pt;" class="">
declare float
@llvm.experimental.vector.reduce.v2.fadd.f32.v4f32(float
%start_value, <4 x float> %vec)<o:p
class=""></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size:
12pt; font-family: Calibri, sans-serif;"
class="">
<span style="font-size: 11pt;" class=""><o:p
class=""> </o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size:
12pt; font-family: Calibri, sans-serif;"
class="">
<span style="font-size: 11pt;" class="">This
means that if the start value is 'undef', the
result will be undef and all code creating
such a reduction will need to ensure it has a
sensible start value (e.g. 0.0 for fadd, 1.0
for fmul). When using 'fast' or ‘reassoc’ on
the call it will be implemented using an
unordered reduction, otherwise it will be
implemented with an ordered reduction. Note
that a new intrinsic is required to capture
the new semantics. In this proposal the
intrinsic is prefixed with a 'v2' for the time
being, with the expectation this will be
dropped when we remove 'experimental' from the
reduction intrinsics in the future.</span><span
style="font-size: 11pt; font-family: "MS
Gothic";" class=""><o:p class=""></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size:
12pt; font-family: Calibri, sans-serif;"
class="">
<span style="font-size: 11pt;" class=""><o:p
class=""> </o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size:
12pt; font-family: Calibri, sans-serif;"
class="">
<span style="font-size: 11pt;" class="">[Option
B] Having separate ordered and unordered
intrinsics (<a
href="https://reviews.llvm.org/D60262"
moz-do-not-send="true" style="color:
rgb(149, 79, 114); text-decoration:
underline;" class="">https://reviews.llvm.org/D60262</a>).<o:p
class=""></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size:
12pt; font-family: Calibri, sans-serif;"
class="">
<span style="font-size: 11pt;" class=""><o:p
class=""> </o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size:
12pt; font-family: Calibri, sans-serif;"
class="">
<span style="font-size: 11pt;" class="">
declare float
@llvm.experimental.vector.reduce.ordered.fadd.f32.v4f32(float
%start_value, <4 x float> %vec)<o:p
class=""></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size:
12pt; font-family: Calibri, sans-serif;"
class="">
<span style="font-size: 11pt;" class="">
declare float
@llvm.experimental.vector.reduce.unordered.fadd.f32.v4f32(<4
x float> %vec)<o:p class=""></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size:
12pt; font-family: Calibri, sans-serif;"
class="">
<span style="font-size: 11pt;" class=""><o:p
class=""> </o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size:
12pt; font-family: Calibri, sans-serif;"
class="">
<span style="font-size: 11pt;" class="">This
will mean that the behaviour is explicit from
the intrinsic and the use of 'fast' or
‘reassoc’ on the call has no effect on how
that intrinsic is lowered. The ordered
reduction intrinsic will take a scalar
start-value operand, where the unordered
reduction intrinsic will only take a vector
operand.<o:p class=""></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size:
12pt; font-family: Calibri, sans-serif;"
class="">
<span style="font-size: 11pt;" class=""><o:p
class=""> </o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size:
12pt; font-family: Calibri, sans-serif;"
class="">
<span style="font-size: 11pt;" class="">Both
options auto-upgrade the IR to use the new
(version of the) intrinsics. I'm personally
slightly in favour of [Option B], because it
better aligns with the definition of the
SelectionDAG nodes and is more explicit in its
semantics. We also avoid having to use an
artificial 'v2' like prefix to denote the new
behaviour of the intrinsic.<o:p class=""></o:p></span></div>
<span style="font-size: 11pt;" class=""><o:p
class=""></o:p></span></div>
</blockquote>
<p class="">Do we have any targets with instructions
that can actually use the start value? TBH I'd be
tempted to suggest we just make the initial
extractelement/fadd/insertelement pattern a manual
extra stage and avoid having having that argument
entirely.<span class="Apple-converted-space"> </span><br
class="">
</p>
</blockquote>
</div>
</blockquote>
<div style="margin: 0px; font-stretch: normal; line-height:
normal; font-family: "Helvetica Neue";" class="">
ARM SVE has the FADDA instruction for strict fadd
reductions (see for example test/MC/AArch64/SVE/fadda.s).
This instruction takes an explicit start-value operand.
The reduction intrinsics were originally introduced for
SVE where we modelled the fadd/fmul reductions with this
instruction in mind.</div>
<div style="margin: 0px; font-stretch: normal; line-height:
normal; font-family: "Helvetica Neue";" class="">
<br class="">
</div>
<div style="margin: 0px; font-stretch: normal; line-height:
normal;" class=""><font class="" face="Helvetica Neue">Just
to clarify, is this what you are suggesting regarding
extract/fadd/insert?<br class="">
<br class="">
%first = extractelement <4 x float> %input, i32
0<br class="">
%first.new = fadd float %start, %first<br class="">
%input.new = insertelement <4 x float> %input,
float %first.new, i32 0<br class="">
%red = call float
@llvm.experimental.vector.reduce.ordered.fadd.f32.v4f32(<4
x float> %input.new)<br class="">
<br class="">
My only reservation here is that LLVM might obfuscate
this code so that CodeGen couldn't easily match the
extract/fadd/insert pattern, thus adding the extra fadd
instruction. This could for example happen if the loop
would be rotated/pipelined to load the next iteration
and doing the first 'fadd' before the next iteration. </font><span
style="font-family: "Helvetica Neue";"
class="">In such case having the extra operand would be
more descriptive.</span></div>
</div>
</div>
</div>
</blockquote>
<p>Yes that was the IR I had in mind, but you're right in that its
probably useful for chained fadd reductions as well as the SVE
specific instruction. If we're getting rid of the fast math
'undef' special case and we expect a 'identity' start value (fadd
= 0.0f, fmul = 1.0f) that we can optimize away then I've no
objections.</p>
<blockquote type="cite"
cite="mid:BB39E23A-CC39-4638-97E7-42EDC563E311@arm.com">
<div class="">
<div class="">
<div>
<blockquote type="cite" class="">
<div class="">
<blockquote type="cite"
cite="mid:d306cf98-1225-732d-8016-7e882b5136b1@redking.me.uk"
style="font-family: Helvetica; font-size: 12px;
font-style: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal; orphans:
auto; text-align: start; text-indent: 0px;
text-transform: none; white-space: normal; widows:
auto; word-spacing: 0px; -webkit-text-size-adjust:
auto; -webkit-text-stroke-width: 0px;
background-color: rgb(255, 255, 255); text-decoration:
none;" class="">
<blockquote type="cite"
cite="mid:67D8F282-9E37-473F-9973-AA981D992711@arm.com"
class="">
<div class="WordSection1" style="page:
WordSection1;">
<div style="margin: 0cm 0cm 0.0001pt; font-size:
12pt; font-family: Calibri, sans-serif;"
class="">
<span style="font-size: 11pt;" class="">Further
efforts:<o:p class=""></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size:
12pt; font-family: Calibri, sans-serif;"
class="">
<span style="font-size: 11pt;" class="">----------------------------<o:p
class=""></o:p></span></div>
<div style="margin: 0cm 0cm 0.0001pt; font-size:
12pt; font-family: Calibri, sans-serif;"
class="">
<span style="font-size: 11pt;" class="">Here a
non-exhaustive list of items I think work
towards making the intrinsics
non-experimental:</span><span
style="font-size: 11pt; font-family: "MS
Gothic";" class="" lang="EN-US">
</span><span
style="font-size: 11pt;" class=""><o:p
class=""></o:p></span></div>
<ul style="margin-bottom: 0cm; margin-top: 0cm;"
class="" type="disc">
<li class="MsoListParagraph" style="margin: 0cm
0cm 0.0001pt -18pt; font-size: 12pt;
font-family: Calibri, sans-serif;">
<span style="font-size: 11pt;" class="">Adding
SelectionDAG legalization for the _STRICT
reduction SDNodes. After some great work
from Nikita in D58015, unordered reductions
are now legalized/expanded in SelectionDAG,
so if we add expansion in SelectionDAG for
strict reductions this would make the
ExpandReductionsPass redundant.<o:p class=""></o:p></span></li>
<li class="MsoListParagraph" style="margin: 0cm
0cm 0.0001pt -18pt; font-size: 12pt;
font-family: Calibri, sans-serif;">
<span style="font-size: 11pt;" class="">Better
enforcing the constraints of the intrinsics
(see<span class="Apple-converted-space"> </span><a
href="https://reviews.llvm.org/D60260"
moz-do-not-send="true" style="color:
rgb(149, 79, 114); text-decoration:
underline;" class="">https://reviews.llvm.org/D60260</a><span
class="Apple-converted-space"> </span>).</span><span
style="font-size: 11pt;" class=""
lang="EN-US">
</span><span style="font-size:
11pt;" class=""><o:p class=""></o:p></span></li>
<li class="MsoListParagraph" style="margin: 0cm
0cm 0.0001pt -18pt; font-size: 12pt;
font-family: Calibri, sans-serif;">
<span style="font-size: 11pt;" class="">I
think we'll also want to be able to overload
the result operand based on the vector
element type for the intrinsics having the
constraint that the result type must match
the vector element type. e.g. dropping the
redundant 'i32' in:</span><span
style="font-size: 11pt;" class=""><br
class="">
<span class="Apple-converted-space"> </span></span><span
style="font-size: 11pt;" class="">i32
@llvm.experimental.vector.reduce.and.i32.v4i32(<4
x i32> %a) => i32
@llvm.experimental.vector.reduce.and.v4i32(<4
x i32> %a)<o:p class=""></o:p></span></li>
</ul>
<div style="margin: 0cm 0cm 0.0001pt 18pt;
font-size: 12pt; font-family: Calibri,
sans-serif;" class="">
<span style="font-size: 11pt;" class="">since
i32 is implied by <4 x i32>. This would
have the added benefit that LLVM would
automatically check for the operands to match.</span><span
style="font-size: 11pt; font-family: "MS
Gothic";" class="" lang="EN-US">
</span></div>
</div>
</blockquote>
<p class="">Won't this cause issues with overflow?
Isn't the point of an add (or mul....) reduction of
say, <64 x i8> giving a larger (i32 or i64)
result so we don't lose anything? I agree for bitop
reductions it doesn't make sense though.<br class="">
</p>
</blockquote>
<span style="caret-color: rgb(0, 0, 0); font-family:
Helvetica; font-size: 12px; font-style: normal;
font-variant-caps: normal; font-weight: normal;
letter-spacing: normal; text-align: start;
text-indent: 0px; text-transform: none; white-space:
normal; word-spacing: 0px; -webkit-text-stroke-width:
0px; background-color: rgb(255, 255, 255);
text-decoration: none; float: none; display: inline
!important;" class="">Sorry - I forgot to add: which
asks the question - should we be considering
signed/unsigned add/mul and possibly saturation
reductions?</span><br style="caret-color: rgb(0, 0,
0); font-family: Helvetica; font-size: 12px;
font-style: normal; font-variant-caps: normal;
font-weight: normal; letter-spacing: normal;
text-align: start; text-indent: 0px; text-transform:
none; white-space: normal; word-spacing: 0px;
-webkit-text-stroke-width: 0px; background-color:
rgb(255, 255, 255); text-decoration: none;" class="">
</div>
</blockquote>
<div style="margin: 0px; font-stretch: normal; line-height:
normal; font-family: "Helvetica Neue";" class="">
The current intrinsics explicitly specify that:</div>
<div style="margin: 0px; font-stretch: normal; line-height:
normal; font-family: "Helvetica Neue";" class="">
"The return type matches the element-type of the vector
input"</div>
<div style="margin: 0px; font-stretch: normal; line-height:
normal; font-family: "Helvetica Neue";" class="">
<br class="">
</div>
<div style="margin: 0px; font-stretch: normal; line-height:
normal; font-family: "Helvetica Neue";" class="">
This was done to avoid having explicit signed/unsigned add
reductions, reasoning that zero- and sign-extension can be
done on the input values to the reduction. We had a bit of
debate on this internally, and it would come down to
similar reasons as for the extra 'start value' operand to
fadd reductions. I think we'd welcome the signed/unsigned
variants as they would be more descriptive and would
safeguard the code from transformations that make it
difficult to fold the sign/zero extend into the operation
during CodeGen. The downside however is that for
signed/unsigned add reductions it would mean that both
operations are the same when the result type equals the
element type.</div>
</div>
</div>
</div>
</blockquote>
<p>An alternative would be that we limit the existing add/mul cases
to the same result type (along with
and/or/xor/smax/smin/umax/umin) and we add sadd/uadd/smul/umul
extending reductions as well.</p>
<blockquote type="cite"
cite="mid:BB39E23A-CC39-4638-97E7-42EDC563E311@arm.com">
<div class="">
<div class="">
<div>
<div style="margin: 0px; font-stretch: normal; line-height:
normal; font-family: "Helvetica Neue";" class="">
<div style="margin: 0px; font-stretch: normal;
line-height: normal;" class="">Saturating vector
reductions sound sensible, but are there any targets
that implement these at the moment?</div>
</div>
</div>
</div>
</div>
</blockquote>
X86/SSE has the v8i16 HADDS/HSUBS horizontal signed saturation
instructions, and X86/XOP has extend+horizontal-add/sub instructions
(<a class="moz-txt-link-freetext" href="https://en.wikipedia.org/wiki/XOP_instruction_set">https://en.wikipedia.org/wiki/XOP_instruction_set</a>).<br>
</body>
</html>