<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

</head>

<body>

<div class="moz-cite-prefix">On 5/18/20 2:53 PM, Sjoerd Meijer wrote:<br>

</div>

<blockquote type="cite" cite="mid:DB6PR0801MB1990EED8D6C0C33EAE46E5A8FCB80@DB6PR0801MB1990.eurprd08.prod.outlook.com">

<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

Hi,</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

I abandoned that approach and followed Eli's suggestion, see somewhere earlier in this thread, and emit an intrinsic that represents/calculates the active mask. I've just uploaded a new revision for D79100 that implements this.</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

Cheers.<br>

</div>

</blockquote>

You have similar problems with <a class="moz-txt-link-freetext" href="https://reviews.llvm.org/D79100">

https://reviews.llvm.org/D79100</a><br>

<br>

Since there are no masked operations, except for load/store.. how are LLVM optimizations supposed to know that they must not hoist/sink operations with side-effects out of the hwloop? These operations have an implicit dependence on the iteration variable.<br>

<br>

What will you do if there are no masked intrinsics in the hwloop body? This can happen once you generate vector code beyond trivial loops or have a vector IR generator other than LV.<br>

<br>

And i am curious why couldn't you use the %evl parameter of VP intrinsics to get the tail predication you are interested in?<br>

<br>

- Simon<br>

<br>

<blockquote type="cite" cite="mid:DB6PR0801MB1990EED8D6C0C33EAE46E5A8FCB80@DB6PR0801MB1990.eurprd08.prod.outlook.com">

<div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<hr style="display:inline-block;width:98%" tabindex="-1">

<div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b> Simon Moll

<a class="moz-txt-link-rfc2396E" href="mailto:Simon.Moll@EMEA.NEC.COM"><Simon.Moll@EMEA.NEC.COM></a><br>

<b>Sent:</b> 18 May 2020 13:32<br>

<b>To:</b> Sjoerd Meijer <a class="moz-txt-link-rfc2396E" href="mailto:Sjoerd.Meijer@arm.com">

<Sjoerd.Meijer@arm.com></a><br>

<b>Cc:</b> Roger Ferrer Ibáñez <a class="moz-txt-link-rfc2396E" href="mailto:rofirrim@gmail.com">

<rofirrim@gmail.com></a>; Eli Friedman <a class="moz-txt-link-rfc2396E" href="mailto:efriedma@quicinc.com">

<efriedma@quicinc.com></a>; <a class="moz-txt-link-abbreviated" href="mailto:listmail@philipreames.com">

listmail@philipreames.com</a> <a class="moz-txt-link-rfc2396E" href="mailto:listmail@philipreames.com">

<listmail@philipreames.com></a>; llvm-dev <a class="moz-txt-link-rfc2396E" href="mailto:llvm-dev@lists.llvm.org">

<llvm-dev@lists.llvm.org></a>; Sander De Smalen <a class="moz-txt-link-rfc2396E" href="mailto:Sander.DeSmalen@arm.com">

<Sander.DeSmalen@arm.com></a>; <a class="moz-txt-link-abbreviated" href="mailto:hanna.kruppe@gmail.com">

hanna.kruppe@gmail.com</a> <a class="moz-txt-link-rfc2396E" href="mailto:hanna.kruppe@gmail.com">

<hanna.kruppe@gmail.com></a><br>

<b>Subject:</b> Re: [llvm-dev] LV: predication</font>

<div> </div>

</div>

<div>

<div class="x_moz-cite-prefix">On 5/5/20 12:07 AM, Sjoerd Meijer via llvm-dev wrote:<br>

</div>

<blockquote type="cite"><style type="text/css" style="display:none">

<!--

p

        {margin-top:0;

        margin-bottom:0}

-->

</style>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif;

            font-size:12pt; color:rgb(0,0,0)">

what we would like to generate is a vector loop with implicit predication, which works by setting up the the number of elements processed by the loop:<br>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif;

              font-size:12pt">

<br>

</div>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif;

              font-size:12pt">

hwloop 10<br>

</div>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif;

              font-size:12pt">

  [i:4] = b[i:4] + c[i:4]</div>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif;

              font-size:12pt">

<br>

</div>

</div>

</blockquote>

Why couldn't you use VP intrinsics and scalable types for this?<br>

<br>

<tt>   %bval = <4 x vscale x double> call @vp.load(..., /* %evl */ 10)</tt><tt><br>

</tt><tt>   %cval = <4 x vscale x double> call @vp.load(..., /* %evl */ 10)</tt><tt><br>

</tt><tt>   %sum = <4 x vscale x double> fadd %bval, %cval</tt><tt><br>

</tt><tt>   store [..]</tt><br>

<br>

I see three issues with the llvm.set.loop.elements approach:<br>

1) It is conceptually broken: as others have pointed out, optimization can move the intrinsic around since the intrinsic doesn't have any dependencies that would naturally keep it in place.<br>

2) The whole proposed set of intrinsics is vendor specific: this causes fragmentation and i don't see why we would want to emit vendor-specific intrinsics in a generic optimization pass. Soon, we would see reports a la "your optimization caused regressions

 for MVE - add a check that the transformation must not touch llvm.set.loop.* or llvm.active.mask intrinsics when compiling for MVE..". I doubt that you would tolerate when that intrinsic were some removed in performance-critical code that would then remain

 scalar as a result.. so, i do not see the "beauty of the approach".<br>

3) We need a reliable solution to properly support vector ISA such as RISC-V V extension and SX-Aurora and also MVE.. i don't see that reliability in this proposal.<br>

<br>

If for whatever reason, the above does not work and seems to far away from your proposal, here is another idea to make more explicit hwloops work with the VP intrinsics - in a way that does not break with optimizations:<br>

 <br>

<tt>vector.preheader:</tt><tt><br>

</tt><tt>  %evl = i32 llvm.hwloop.set.elements(%n)</tt><tt><br>

</tt><br>

<tt>vector.body:</tt><tt><br>

</tt><tt>  %lastevl = phi 32 [%evl, %preheader, %next.evl, vector.body]</tt><tt><br>

</tt><tt>  %aval = call @llvm.vp.load(Aptr, .., %evl)</tt><tt><br>

</tt><tt>  call @llvm.vp.store(Bptr, %aval, ..., %evl)</tt><tt><br>

</tt><tt>  %next.evl = call i32 @llvm.hwloop.decrement(%evl)</tt><br>

<br>

Note that the way VP intrinsics are designed, it is not possible to break this code by hoisting the VP calls out of the loop: passing "%evl >= the operation's vector size" consitutes UB (see

<a class="x_moz-txt-link-freetext" href="https://llvm.org/docs/LangRef.html#vector-predication-intrinsics" moz-do-not-send="true">

https://llvm.org/docs/LangRef.html#vector-predication-intrinsics</a>). We can use attributes to do the same for sinking (eg don't move VP across hwloop.decrement).<br>

<br>

- Simon<br>

</div>

<br>

<br>

<font style="BACKGROUND-COLOR: #ffffff">

<p align="center"><font style="BACKGROUND-COLOR: #ffffff">Click <a href="https://www.mailcontrol.com/sr/nsi3EguIhU7GX2PQPOmvUg0Q1FXI7Aab46SsJMiMHdmGzr7A9AzNdHpVFWx1NCcWI3IMY6gxm-fOTml8Ao4xWg==" moz-do-not-send="true">

here</a> to report this email as spam.</font></p>

</font></blockquote>

<br>

</body>

</html>