<div dir="ltr"><div dir="ltr"><div>Hi Sjoerd,<br></div><div><br></div><div></div></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><br><div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">

Trying to remember how everything fits together here, but could get.active.lane.mask not create the %mask of the VP intrinsics? Or in other words, in the vectoriser, who's producing the %mask and %evl that is consumed by the VP intrinsics?</div>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">

<br></div></div></blockquote><div><div dir="ltr"><div>I'm not sure what would be the best way here. I think about the Loop Vectorizer. I imagine at some point we can teach LV to emit VPred for the widening. VPred IR needs two additional operands, as you mentioned, %evl and %mask.<br></div><div><br></div><div>One option is make %evl the max-vector-length of the type being operated and %mask (that is the "outer block mask" in this context) be  get.active.lane.mask. This maps well for SVE and MVE not so much for VE and RISC-V (I don't think it is incorrect but it is not an efficient thing to do).  Perhaps VE and RISC-V can work in this scenario if at some point they replace the %evl with something like "%n - %base" operands of get.active.lane.mask, and %mask (the outer block mask) is replaced with a splat of "i1 1".<br></div><div><br></div><div>Another option here is make "%n - %base" be the %evl (or at least an operand of some target hook because "computing" the %evl is target-specific, targets without evl could compute the identity here) and %mask (the outer block mask) be a splat of "i1 1". This maps well VE and RISC-V but makes life harder for AVX-512, SVE and MVE (in general any target where TargetTransformInfo::hasActiveVectorLength returns false). Those targets could replace the %evl with the max-vector-length of the operated type and then use get.active.lane.mask(0, %evl) as the outer block mask. My understanding is that Simon used this approach in <a href="https://reviews.llvm.org/D78203">https://reviews.llvm.org/D78203</a> but in a more general setting, that would be independent of what Loop Vectorizer does.</div><div><br></div><div>Looks to me the second option makes a more effective use of vpred and D78203 shows that we can always soften vpred into a shape that is reasonable for lowering in targets without active vector length.<br></div><div><br></div><div>Thoughts?<br></div><div><br></div><div>Kind regards,</div></div></div></div>-- <br><div dir="ltr" class="gmail_signature">Roger Ferrer Ibáñez<br></div></div>