<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 5/4/20 3:04 AM, Sjoerd Meijer via
llvm-dev wrote:<br>
</div>
<blockquote type="cite"
cite="mid:VI1PR08MB2640697713F2F00B320B6EACFCA60@VI1PR08MB2640.eurprd08.prod.outlook.com">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
> The harm comes if the intrinsic ends up with the wrong
value, or attached to the wrong loop.<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
The intrinsic is marked as IntrNoDuplicate, so I wasn't worried
about it ending up somewhere else. Also, it is a property of a
specific loop, a tail-folded vector loop, that holds even after
it is transformed I think. I.e. unrolling a vector loop is
probably not what you want, but even if you do the element count
would remain the same. But yes, I agree that a future whacky
optimisation on vector loops could invalidate this, which you
can then skip but then you lose out on it.... So, I really like
this:</div>
</blockquote>
<p>This approach really doesn't work. Not unless you're willing to
impose legality restrictions on optimization passes to preserve
the information. <br>
</p>
<p><br>
</p>
<p>It's helpful to think of the optimizer as being adversarial. The
question is not "will the optimizer break this?"; it's "can a
malicious optimizer break this?". Unless you can reason from the
spec (LangRef) that the answer is no, then the answer is yes.</p>
<p><br>
</p>
<p>In your particular example, consider what might happen is loop
fission runs on your vectorized loop, then we recognize that
iterations N through M of the first loop (after fission) were nops
and split it into two loops over narrow ranges. You'd have real
trouble matching your intrinsic to anything meaningful in the
backend, and getting it wrong would be a correctness bug.<br>
</p>
<blockquote type="cite"
cite="mid:VI1PR08MB2640697713F2F00B320B6EACFCA60@VI1PR08MB2640.eurprd08.prod.outlook.com">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
> If the problem is specifically figuring out the underlying
element count given a predicate, maybe we could attack it from
that angle? For example, introduce a special intrinsic for
deriving the mask (sort of like the SVE whilelo).</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
That would be an excellent way of doing it and it would also map
very well to MVE too, where we have a VCTP intrinsic/instruction
that creates the mask/predicate (Vector Create Tail-Predicate).
So I will go for this approach. Such an intrinsic was actually
also proposed in Sam's original RFC (see <a
href="https://lists.llvm.org/pipermail/llvm-dev/2019-May/132512.html"
id="LPlnk982545" moz-do-not-send="true">
https://lists.llvm.org/pipermail/llvm-dev/2019-May/132512.html</a>),
but we hadn't implemented it yet. This intrinsic will probably
look something like this:</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
<N x i1> @llvm.loop.get.active.mask(AnyInt, AnyInt)<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
It produces a <N x i1> predicate based on its two
arguments, the number of elements and the vector trip count, and
it will be used by the predicated masked loads/stores
instructions in the vector body. I will start drafting an
implementation for this and continue with this in D79100.<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
Thanks,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
Sjoerd.<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt"
face="Calibri, sans-serif" color="#000000"><b>From:</b> Eli
Friedman <a class="moz-txt-link-rfc2396E" href="mailto:efriedma@quicinc.com"><efriedma@quicinc.com></a><br>
<b>Sent:</b> 01 May 2020 21:11<br>
<b>To:</b> Sjoerd Meijer <a class="moz-txt-link-rfc2396E" href="mailto:Sjoerd.Meijer@arm.com"><Sjoerd.Meijer@arm.com></a>;
llvm-dev <a class="moz-txt-link-rfc2396E" href="mailto:llvm-dev@lists.llvm.org"><llvm-dev@lists.llvm.org></a><br>
<b>Subject:</b> RE: [llvm-dev] LV: predication</font>
<div> </div>
</div>
<div lang="EN-US">
<div class="x_WordSection1">
<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt;
font-size: 11pt; font-family: "Calibri",
sans-serif;">
</p>
<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt;
font-size: 11pt; font-family: "Calibri",
sans-serif;">
</p>
<div>
<div style="border:none; border-top:solid #E1E1E1 1.0pt;
padding:3.0pt 0in 0in 0in">
<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt;
font-size: 11pt; font-family: "Calibri",
sans-serif;margin-left:.5in">
<b>From:</b> Sjoerd Meijer <a class="moz-txt-link-rfc2396E" href="mailto:Sjoerd.Meijer@arm.com"><Sjoerd.Meijer@arm.com></a>
<br>
<b>Sent:</b> Friday, May 1, 2020 11:54 AM<br>
<b>To:</b> Eli Friedman <a class="moz-txt-link-rfc2396E" href="mailto:efriedma@quicinc.com"><efriedma@quicinc.com></a>;
llvm-dev <a class="moz-txt-link-rfc2396E" href="mailto:llvm-dev@lists.llvm.org"><llvm-dev@lists.llvm.org></a><br>
<b>Subject:</b> [EXT] Re: [llvm-dev] LV: predication</p>
</div>
</div>
<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt;
font-size: 11pt; font-family: "Calibri",
sans-serif;margin-left:.5in">
</p>
<div>
<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt;
font-size: 11pt; font-family: "Calibri",
sans-serif;margin-left:.5in">
<span style="font-size:12.0pt; color:black">Hi Eli,</span></p>
</div>
<div>
<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt;
font-size: 11pt; font-family: "Calibri",
sans-serif;margin-left:.5in">
<span style="font-size:12.0pt; color:black"> </span></p>
</div>
<div>
<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt;
font-size: 11pt; font-family: "Calibri",
sans-serif;margin-left:.5in">
<span style="font-size:12.0pt; color:black">> The
problem with your proposal, as written, is that the
vectorizer is producing the intrinsic. Because we don’t
impose any ordering on optimizations before codegen,
every optimization pass in LLVM would have to be taught
to preserve any @llvm.set.loop.elements.i32 whenever it
makes any change. This is completely impractical
because the intrinsic isn’t related to anything
optimizations would normally look for: it’s a random
intrinsic in the middle of nowhere.</span></p>
</div>
<div>
<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt;
font-size: 11pt; font-family: "Calibri",
sans-serif;margin-left:.5in">
<span style="font-size:12.0pt; color:black"> </span></p>
</div>
<div>
<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt;
font-size: 11pt; font-family: "Calibri",
sans-serif;margin-left:.5in">
<span style="font-size:12.0pt; color:black">I do see that
point. But is that also not the beauty of it? It just
sits in the preheader, if gets removed, then so be it.
And if it not recognised, then also no harm done?</span></p>
<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt;
font-size: 11pt; font-family: "Calibri",
sans-serif;">
</p>
<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt;
font-size: 11pt; font-family: "Calibri",
sans-serif;">
The harm comes if the intrinsic ends up with the wrong
value, or attached to the wrong loop.
</p>
</div>
<div>
<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt;
font-size: 11pt; font-family: "Calibri",
sans-serif;margin-left:.5in">
<span style="font-size:12.0pt; color:black"> </span></p>
</div>
<div>
<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt;
font-size: 11pt; font-family: "Calibri",
sans-serif;margin-left:.5in">
<span style="font-size:12.0pt; color:black">> Probably
the simplest path to get this working is to derive the
number of elements in the backend (in HardwareLoops, or
your tail predication pass). You should be able to
figure it from the masks used in the
llvm.masked.load/store instructions in the loop.</span></p>
</div>
<div>
<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt;
font-size: 11pt; font-family: "Calibri",
sans-serif;margin-left:.5in">
<span style="font-size:12.0pt; color:black"> </span></p>
</div>
<div>
<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt;
font-size: 11pt; font-family: "Calibri",
sans-serif;margin-left:.5in">
<span style="font-size:12.0pt; color:black">This is what
we are currently doing and works excellent for simpler
cases. For the more complicated cases that we now what
to handle as well, the pattern matching just becomes a
bit too horrible, and it is fragile too. All we need is
the information that the vectoriser already has, and
pass this on somehow.</span></p>
</div>
<div>
<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt;
font-size: 11pt; font-family: "Calibri",
sans-serif;margin-left:.5in">
<span style="font-size:12.0pt; color:black"> </span></p>
</div>
<div>
<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt;
font-size: 11pt; font-family: "Calibri",
sans-serif;margin-left:.5in">
<span style="font-size:12.0pt; color:black">As I am really
keen to simply our backend pass, would there be another
way to pass this information on? If emitting an
intrinsic is a blocker, could this be done with a loop
annotation?</span></p>
<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt;
font-size: 11pt; font-family: "Calibri",
sans-serif;">
</p>
<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt;
font-size: 11pt; font-family: "Calibri",
sans-serif;">
If the problem is specifically figuring out the underlying
element count given a predicate, maybe we could attack it
from that angle? For example, introduce a special
intrinsic for deriving the mask (sort of like the SVE
whilelo).</p>
<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt;
font-size: 11pt; font-family: "Calibri",
sans-serif;">
</p>
<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt;
font-size: 11pt; font-family: "Calibri",
sans-serif;">
-Eli<span style="font-size:12.0pt; color:black"> </span></p>
</div>
<div>
<div>
<div>
<p class="x_xmsonormal" style="margin: 0in 0in 0.0001pt;
font-size: 11pt; font-family: "Calibri",
sans-serif;margin-left:.5in">
<span style="font-size:12.0pt; color:black"> </span></p>
</div>
</div>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
</blockquote>
</body>
</html>