<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">

<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>

</head>

<body dir="ltr">

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

> The harm comes if the intrinsic ends up with the wrong value, or attached to the wrong loop.<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

The intrinsic is marked as IntrNoDuplicate, so I wasn't worried about it ending up somewhere else. Also, it is a property of a specific loop, a tail-folded vector loop, that holds even after it is transformed I think. I.e. unrolling a vector loop is probably

 not what you want, but even if you do the element count would remain the same. But yes, I agree that a future whacky optimisation on vector loops could invalidate this, which you can then skip but then you lose out on it.... So, I really like this:</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

> If the problem is specifically figuring out the underlying element count given a predicate, maybe we could attack it from that angle?  For example, introduce a special intrinsic for deriving the mask (sort of like the SVE whilelo).</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

That would be an excellent way of doing it and it would also map very well to MVE too, where we have a VCTP intrinsic/instruction that creates the mask/predicate (Vector Create Tail-Predicate). So I will go for this approach. Such an intrinsic was actually

 also proposed in Sam's original RFC (see <a href="https://lists.llvm.org/pipermail/llvm-dev/2019-May/132512.html" id="LPlnk982545">

https://lists.llvm.org/pipermail/llvm-dev/2019-May/132512.html</a>), but we hadn't implemented it yet. This intrinsic will probably look something like this:</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

    <N x i1> @llvm.loop.get.active.mask(AnyInt, AnyInt)<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

It produces a <N x i1> predicate based on its two arguments, the number of elements and the vector trip count, and it will be used by the predicated masked loads/stores instructions in the vector body. I will start drafting an implementation for this and continue

 with this in D79100.<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

Thanks,</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

Sjoerd.<br>

</div>

<div id="appendonsend"></div>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

<br>

</div>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

<br>

</div>

<hr tabindex="-1" style="display:inline-block; width:98%">

<div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b> Eli Friedman <efriedma@quicinc.com><br>

<b>Sent:</b> 01 May 2020 21:11<br>

<b>To:</b> Sjoerd Meijer <Sjoerd.Meijer@arm.com>; llvm-dev <llvm-dev@lists.llvm.org><br>

<b>Subject:</b> RE: [llvm-dev] LV: predication</font>

<div> </div>

</div>

<div lang="EN-US">

<div class="x_WordSection1">

<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: "Calibri", sans-serif;">

 </p>

<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: "Calibri", sans-serif;">

 </p>

<div>

<div style="border:none; border-top:solid #E1E1E1 1.0pt; padding:3.0pt 0in 0in 0in">

<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: "Calibri", sans-serif;margin-left:.5in">

<b>From:</b> Sjoerd Meijer <Sjoerd.Meijer@arm.com> <br>

<b>Sent:</b> Friday, May 1, 2020 11:54 AM<br>

<b>To:</b> Eli Friedman <efriedma@quicinc.com>; llvm-dev <llvm-dev@lists.llvm.org><br>

<b>Subject:</b> [EXT] Re: [llvm-dev] LV: predication</p>

</div>

</div>

<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: "Calibri", sans-serif;margin-left:.5in">

 </p>

<div>

<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: "Calibri", sans-serif;margin-left:.5in">

<span style="font-size:12.0pt; color:black">Hi Eli,</span></p>

</div>

<div>

<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: "Calibri", sans-serif;margin-left:.5in">

<span style="font-size:12.0pt; color:black"> </span></p>

</div>

<div>

<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: "Calibri", sans-serif;margin-left:.5in">

<span style="font-size:12.0pt; color:black">> The problem with your proposal, as written, is that the vectorizer is producing the intrinsic.  Because we don’t impose any ordering on optimizations before codegen, every optimization pass in LLVM would have to

 be taught to preserve any @llvm.set.loop.elements.i32 whenever it makes any change.  This is completely impractical because the intrinsic isn’t related to anything optimizations would normally look for: it’s a random intrinsic in the middle of nowhere.</span></p>

</div>

<div>

<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: "Calibri", sans-serif;margin-left:.5in">

<span style="font-size:12.0pt; color:black"> </span></p>

</div>

<div>

<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: "Calibri", sans-serif;margin-left:.5in">

<span style="font-size:12.0pt; color:black">I do see that point. But is that also not the beauty of it? It just sits in the preheader, if gets removed, then so be it. And if it not recognised, then also no harm done?</span></p>

<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: "Calibri", sans-serif;">

 </p>

<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: "Calibri", sans-serif;">

The harm comes if the intrinsic ends up with the wrong value, or attached to the wrong loop.

</p>

</div>

<div>

<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: "Calibri", sans-serif;margin-left:.5in">

<span style="font-size:12.0pt; color:black"> </span></p>

</div>

<div>

<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: "Calibri", sans-serif;margin-left:.5in">

<span style="font-size:12.0pt; color:black">> Probably the simplest path to get this working is to derive the number of elements in the backend (in HardwareLoops, or your tail predication pass). You should be able to figure it from the masks used in the llvm.masked.load/store

 instructions in the loop.</span></p>

</div>

<div>

<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: "Calibri", sans-serif;margin-left:.5in">

<span style="font-size:12.0pt; color:black"> </span></p>

</div>

<div>

<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: "Calibri", sans-serif;margin-left:.5in">

<span style="font-size:12.0pt; color:black">This is what we are currently doing and works excellent for simpler cases. For the more complicated cases that we now what to handle as well, the pattern matching just becomes a bit too horrible, and it is fragile

 too. All we need is the information that the vectoriser already has, and pass this on somehow.</span></p>

</div>

<div>

<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: "Calibri", sans-serif;margin-left:.5in">

<span style="font-size:12.0pt; color:black"> </span></p>

</div>

<div>

<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: "Calibri", sans-serif;margin-left:.5in">

<span style="font-size:12.0pt; color:black">As I am really keen to simply our backend pass, would there be another way to pass this information on? If emitting an intrinsic is a blocker, could this be done with a loop annotation?</span></p>

<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: "Calibri", sans-serif;">

 </p>

<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: "Calibri", sans-serif;">

If the problem is specifically figuring out the underlying element count given a predicate, maybe we could attack it from that angle?  For example, introduce a special intrinsic for deriving the mask (sort of like the SVE whilelo).</p>

<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: "Calibri", sans-serif;">

 </p>

<p class="x_MsoNormal" style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: "Calibri", sans-serif;">

-Eli<span style="font-size:12.0pt; color:black"> </span></p>

</div>

<div>

<div>

<div>

<p class="x_xmsonormal" style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: "Calibri", sans-serif;margin-left:.5in">

<span style="font-size:12.0pt; color:black"> </span></p>

</div>

</div>

</div>

</div>

</div>

</body>

</html>