<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Hi,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
I abandoned that approach and followed Eli's suggestion, see somewhere earlier in this thread, and emit an intrinsic that represents/calculates the active mask. I've just uploaded a new revision for D79100 that implements this.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Cheers.<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Simon Moll <Simon.Moll@EMEA.NEC.COM><br>
<b>Sent:</b> 18 May 2020 13:32<br>
<b>To:</b> Sjoerd Meijer <Sjoerd.Meijer@arm.com><br>
<b>Cc:</b> Roger Ferrer Ibáñez <rofirrim@gmail.com>; Eli Friedman <efriedma@quicinc.com>; listmail@philipreames.com <listmail@philipreames.com>; llvm-dev <llvm-dev@lists.llvm.org>; Sander De Smalen <Sander.DeSmalen@arm.com>; hanna.kruppe@gmail.com <hanna.kruppe@gmail.com><br>
<b>Subject:</b> Re: [llvm-dev] LV: predication</font>
<div> </div>
</div>
<div>
<div class="x_moz-cite-prefix">On 5/5/20 12:07 AM, Sjoerd Meijer via llvm-dev wrote:<br>
</div>
<blockquote type="cite"><style type="text/css" style="display:none">
<!--
p
{margin-top:0;
margin-bottom:0}
-->
</style>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">
what we would like to generate is a vector loop with implicit predication, which works by setting up the the number of elements processed by the loop:<br>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt">hwloop 10<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"> [i:4] = b[i:4] + c[i:4]</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt"></div>
</div>
</blockquote>
Why couldn't you use VP intrinsics and scalable types for this?<br>
<br>
<tt> %bval = <4 x vscale x double> call @vp.load(..., /* %evl */ 10)</tt><tt><br>
</tt><tt> %cval = <4 x vscale x double> call @vp.load(..., /* %evl */ 10)</tt><tt><br>
</tt><tt> %sum = <4 x vscale x double> fadd %bval, %cval</tt><tt><br>
</tt><tt> store [..]</tt><br>
<br>
I see three issues with the llvm.set.loop.elements approach:<br>
1) It is conceptually broken: as others have pointed out, optimization can move the intrinsic around since the intrinsic doesn't have any dependencies that would naturally keep it in place.<br>
2) The whole proposed set of intrinsics is vendor specific: this causes fragmentation and i don't see why we would want to emit vendor-specific intrinsics in a generic optimization pass. Soon, we would see reports a la "your optimization caused regressions
for MVE - add a check that the transformation must not touch llvm.set.loop.* or llvm.active.mask intrinsics when compiling for MVE..". I doubt that you would tolerate when that intrinsic were some removed in performance-critical code that would then remain
scalar as a result.. so, i do not see the "beauty of the approach".<br>
3) We need a reliable solution to properly support vector ISA such as RISC-V V extension and SX-Aurora and also MVE.. i don't see that reliability in this proposal.<br>
<br>
If for whatever reason, the above does not work and seems to far away from your proposal, here is another idea to make more explicit hwloops work with the VP intrinsics - in a way that does not break with optimizations:<br>
<br>
<tt>vector.preheader:</tt><tt><br>
</tt><tt> %evl = i32 llvm.hwloop.set.elements(%n)</tt><tt><br>
</tt><br>
<tt>vector.body:</tt><tt><br>
</tt><tt> %lastevl = phi 32 [%evl, %preheader, %next.evl, vector.body]</tt><tt><br>
</tt><tt> %aval = call @llvm.vp.load(Aptr, .., %evl)</tt><tt><br>
</tt><tt> call @llvm.vp.store(Bptr, %aval, ..., %evl)</tt><tt><br>
</tt><tt> %next.evl = call i32 @llvm.hwloop.decrement(%evl)</tt><br>
<br>
Note that the way VP intrinsics are designed, it is not possible to break this code by hoisting the VP calls out of the loop: passing "%evl >= the operation's vector size" consitutes UB (see
<a class="x_moz-txt-link-freetext" href="https://llvm.org/docs/LangRef.html#vector-predication-intrinsics">
https://llvm.org/docs/LangRef.html#vector-predication-intrinsics</a>). We can use attributes to do the same for sinking (eg don't move VP across hwloop.decrement).<br>
<br>
- Simon<br>
</div>
</body>
</html>