<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

</head>

<body>

<div class="moz-cite-prefix">On 5/5/20 12:07 AM, Sjoerd Meijer via llvm-dev wrote:<br>

</div>

<blockquote type="cite" cite="mid:DB6PR0801MB1990A5E50AA52D649F2B6ACBFCA60@DB6PR0801MB1990.eurprd08.prod.outlook.com">

<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;}</style>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif;

        font-size: 12pt; color: rgb(0, 0, 0);">

what we would like to generate is a vector loop with implicit predication, which works by setting up the the number of elements processed by the loop:<br>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif;

          font-size: 12pt">

<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif;

          font-size: 12pt">

hwloop 10<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif;

          font-size: 12pt">

  [i:4] = b[i:4] + c[i:4]</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif;

          font-size: 12pt">

<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif;

          font-size: 12pt">

</div>

</div>

</blockquote>

Why couldn't you use VP intrinsics and scalable types for this?<br>

<br>

<tt>   %bval = <4 x vscale x double> call @vp.load(..., /* %evl */ 10)</tt><tt><br>

</tt><tt>   %cval = <4 x vscale x double> call @vp.load(..., /* %evl */ 10)</tt><tt><br>

</tt><tt>   %sum = <4 x vscale x double> fadd %bval, %cval</tt><tt><br>

</tt><tt>   store [..]</tt><br>

<br>

I see three issues with the llvm.set.loop.elements approach:<br>

1) It is conceptually broken: as others have pointed out, optimization can move the intrinsic around since the intrinsic doesn't have any dependencies that would naturally keep it in place.<br>

2) The whole proposed set of intrinsics is vendor specific: this causes fragmentation and i don't see why we would want to emit vendor-specific intrinsics in a generic optimization pass. Soon, we would see reports a la "your optimization caused regressions

 for MVE - add a check that the transformation must not touch llvm.set.loop.* or llvm.active.mask intrinsics when compiling for MVE..". I doubt that you would tolerate when that intrinsic were some removed in performance-critical code that would then remain

 scalar as a result.. so, i do not see the "beauty of the approach".<br>

3) We need a reliable solution to properly support vector ISA such as RISC-V V extension and SX-Aurora and also MVE.. i don't see that reliability in this proposal.<br>

<br>

If for whatever reason, the above does not work and seems to far away from your proposal, here is another idea to make more explicit hwloops work with the VP intrinsics - in a way that does not break with optimizations:<br>

 <br>

<tt>vector.preheader:</tt><tt><br>

</tt><tt>  %evl = i32 llvm.hwloop.set.elements(%n)</tt><tt><br>

</tt><br>

<tt>vector.body:</tt><tt><br>

</tt><tt>  %lastevl = phi 32 [%evl, %preheader, %next.evl, vector.body]</tt><tt><br>

</tt><tt>  %aval = call @llvm.vp.load(Aptr, .., %evl)</tt><tt><br>

</tt><tt>  call @llvm.vp.store(Bptr, %aval, ..., %evl)</tt><tt><br>

</tt><tt>  %next.evl = call i32 @llvm.hwloop.decrement(%evl)</tt><br>

<br>

Note that the way VP intrinsics are designed, it is not possible to break this code by hoisting the VP calls out of the loop: passing "%evl >= the operation's vector size" consitutes UB (see

<a class="moz-txt-link-freetext" href="https://llvm.org/docs/LangRef.html#vector-predication-intrinsics">

https://llvm.org/docs/LangRef.html#vector-predication-intrinsics</a>). We can use attributes to do the same for sinking (eg don't move VP across hwloop.decrement).<br>

<br>

- Simon<br>

</body>

</html>