<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>

</head>

<body dir="ltr">

<blockquote class="quotableTextTraining" style="border-color: rgb(200, 200, 200); border-left: 3px solid rgb(200, 200, 200); padding-left: 1ex; margin-left: 0.8ex; color: rgb(102, 102, 102);">

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

For RISC-V V and VE being explicit about %evl is important for performance & correctness and that is what VP does. The get.active.lane.mask intrinsic is used as a hint for the MVE, SVE backends to use hardware tail-predication (the backends reverse engineer

 that hint by pattern matching for get.active.lane.mask in the mask parameter of "some" masked intrinsics). IMHO, it's more of a hot fix to get some tail-predication working quickly with the existing infrastructure. It is still useful by itself, eg the ExpandVPIntrinsic

 pass uses it to expand the %evl parameter in VP intrinsics for scalable vector types.</div>

</blockquote>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

So I don't think that makes it a hot fix <span id="��">��</span>, but agreed with the general picture here.</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<blockquote class="quotableTextTraining" style="border-color: rgb(200, 200, 200); border-left: 3px solid rgb(200, 200, 200); padding-left: 1ex; margin-left: 0.8ex; color: rgb(102, 102, 102);">

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

VE uses VP-style SDNodes in the isel layer (upstream patch on Phabricator to follow soon-ish). We simply translate both VP and regular SIMD SDNodes into these custom SDNodes as an intermediate layer. Even the VE machine instructions still have an explicit %evl

 operand. We have a machine function pass that inserts code to re-configure the VL register in-between vector instructions that have a different %evl value (we had a poster on that at the LLVM US DevMtg '19). This isel strategy has been working well for us.<br>

<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

The goal is to teach LV, VPlan to emit VP intrinsics with a convenient builder class (VPBuilder in the reference patch).</div>

</blockquote>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

Trying to remember how everything fits together here, but could get.active.lane.mask not create the %mask of the VP intrinsics? Or in other words, in the vectoriser, who's producing the %mask and %evl that is consumed by the VP intrinsics?</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

Cheers,</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

Sjoerd.<br>

</div>

<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">

<br>

</div>

<div id="appendonsend"></div>

<hr style="display:inline-block;width:98%" tabindex="-1">

<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Simon Moll <Simon.Moll@EMEA.NEC.COM><br>

<b>Sent:</b> 05 November 2020 11:07<br>

<b>To:</b> Roger Ferrer Ibáñez <rofirrim@gmail.com>; Sjoerd Meijer <Sjoerd.Meijer@arm.com><br>

<b>Cc:</b> Renato Golin <rengolin@gmail.com>; Vineet Kumar <vineet.kumar@bsc.es>; LLVM Dev <llvm-dev@lists.llvm.org>; ROGER FERRER IBANEZ <roger.ferrer@bsc.es>; Arai, Masaki <arai.masaki@jp.fujitsu.com><br>

<b>Subject:</b> Re: [llvm-dev] Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)</font>

<div> </div>

</div>

<div>

<div class="x_moz-cite-prefix">Hi all,<br>

<br>

On 11/5/20 10:32 AM, Roger Ferrer Ibáñez wrote:<br>

</div>

<blockquote type="cite">

<div dir="ltr">

<div>Hi Sjoerd,</div>

<div><br>

</div>

<div>thanks for pointing us to this intrinsic.</div>

<div><br>

</div>

<div>I see it returns a mask/predicate type. My understanding is that VPred intrinsics have both a vector length operand and a mask operand. It looks to me that a "popcount" of get.active.lane.mask would correspond to the vector length operand. Then additional

 "control flow" mask of predicated code would correspond to the mask operand.</div>

<div><br>

</div>

<div>My intepretation was that get.active.lane.mask allowed targets that do not have a concept of vector length (such as SVE or MVE) to represent it as a mask. For those targets, the vector length operand can be given a value that means "use the whole register"

 and then only the mask operand is relevant to them.</div>

</div>

</blockquote>

For RISC-V V and VE being explicit about %evl is important for performance & correctness and that is what VP does. The get.active.lane.mask intrinsic is used as a hint for the MVE, SVE backends to use hardware tail-predication (the backends reverse engineer

 that hint by pattern matching for get.active.lane.mask in the mask parameter of "some" masked intrinsics). IMHO, it's more of a hot fix to get some tail-predication working quickly with the existing infrastructure. It is still useful by itself, eg the ExpandVPIntrinsic

 pass uses it to expand the %evl parameter in VP intrinsics for scalable vector types.<br>

<blockquote type="cite">

<div dir="ltr">

<div><br>

</div>

<div>But maybe my interpretation is wrong.</div>

<div><br>

</div>

<div>@Simon: what is VE going to do here?<br>

</div>

</div>

</blockquote>

<br>

VE uses VP-style SDNodes in the isel layer (upstream patch on Phabricator to follow soon-ish). We simply translate both VP and regular SIMD SDNodes into these custom SDNodes as an intermediate layer. Even the VE machine instructions still have an explicit %evl

 operand. We have a machine function pass that inserts code to re-configure the VL register in-between vector instructions that have a different %evl value (we had a poster on that at the LLVM US DevMtg '19). This isel strategy has been working well for us.<br>

<br>

The goal is to teach LV, VPlan to emit VP intrinsics with a convenient builder class (VPBuilder in the reference patch).<br>

<br>

- Simon<br>

<br>

<blockquote type="cite">

<div dir="ltr">

<div></div>

<div><br>

</div>

<div>Kind regards,<br>

</div>

</div>

<br>

<div class="x_gmail_quote">

<div dir="ltr" class="x_gmail_attr">Missatge de Sjoerd Meijer via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> del dia dj., 5 de nov. 2020 a les 10:00:<br>

</div>

<blockquote class="x_gmail_quote" style="margin:0px 0px 0px

          0.8ex; border-left:1px solid rgb(204,204,204); padding-left:1ex">

<div dir="ltr">

<blockquote style="border-color:rgb(200,200,200); border-left:3px solid

rgb(200,200,200); padding-left:1ex; margin-left:0.8ex; color:rgb(102,102,102)">

<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

Fold the epilog loop into the vector body.

<ul>

<li>This is done by setting the vector length in each iteration. This induces a predicate/mask over all the vector instructions of the loop (any other predicates/masks in the vector body are needed for control flow).

</li></ul>

</div>

</blockquote>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

That's what we do for Arm MVE using intrinsic get.active.lane.mask (*) which is emitted in the vectoriser. It generates a predicate that is used by the masked loads/stores. That's the current state of the art, long term that should indeed be using the VP intrinsics.

 Just wanted to point you at  get.active.lane.mask, because it would also be nice to get confirmation that this not only works for fixed vectors but also scalable vectors, which I think should be the case...<br>

</div>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

<br>

</div>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

(*) <a href="https://llvm.org/docs/LangRef.html#llvm-get-active-lane-mask-intrinsics" id="x_gmail-m_-387898839179301389LPlnk533399" target="_blank">

https://llvm.org/docs/LangRef.html#llvm-get-active-lane-mask-intrinsics</a></div>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

<br>

</div>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

Cheers,</div>

<div style="font-family:Calibri,Arial,Helvetica,sans-serif; font-size:12pt; color:rgb(0,0,0)">

Sjoerd.<br>

</div>

<hr style="display:inline-block; width:98%">

<div id="x_gmail-m_-387898839179301389divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> llvm-dev <<a href="mailto:llvm-dev-bounces@lists.llvm.org" target="_blank">llvm-dev-bounces@lists.llvm.org</a>>

 on behalf of Vineet Kumar via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>><br>

<b>Sent:</b> 05 November 2020 01:36<br>

<b>To:</b> Renato Golin <<a href="mailto:rengolin@gmail.com" target="_blank">rengolin@gmail.com</a>><br>

<b>Cc:</b> LLVM Dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>; ROGER FERRER IBANEZ <<a href="mailto:roger.ferrer@bsc.es" target="_blank">roger.ferrer@bsc.es</a>>; Arai, Masaki <<a href="mailto:arai.masaki@jp.fujitsu.com" target="_blank">arai.masaki@jp.fujitsu.com</a>><br>

<b>Subject:</b> Re: [llvm-dev] Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)</font>

<div> </div>

</div>

<div>

<p>Hi Renato,</p>

<p>Thanks a lot for your comments!</p>

<p>(more inline.)</p>

<p><br>

</p>

<p>Thanks and Regards,</p>

<p>Vineet</p>

<p><br>

</p>

<div>On 2020-11-02 5:43 p.m., Renato Golin wrote:<br>

</div>

<blockquote type="cite">

<div dir="ltr">Hi Vineet,

<div><br>

</div>

<div>Thanks for sharing! I haven't looked at the code yet, just read the README file you have and it has already answered a lot of questions that I initially had. Some general comments...</div>

<div><br>

</div>

<div>I'm very happy to see that Simon's predication changes were useful to your work. It's a nice validation of their work and hopefully will help SVE, too.</div>

</div>

</blockquote>

Simon's vector predication ideas fit really nicely with our approach to predicated vectorization, specially the support for EVL parameter. We look forward to more discussions around it.<br>

<blockquote type="cite">

<div dir="ltr">

<div><br>

</div>

<div>Your main approach to strip-mine + fuse tail loop is what I was going to propose for now. It matches well with the bite-sized approach VPlan has and could build on existing vector formats. For example, you always try to strip-mine (for scalable and non-scalable)

 and then only for scalable, you try to fuse the scalar loops, which would improve the solution and give RVV/SVVE an edge over the other extensions on the same hardware.</div>

</div>

</blockquote>

While our implemented approach with tail folding and predication is guided by the research interests of the EPI project, I agree that for a more general implementation your proposed approach for now makes more sense before moving on to better predication support

 and exploring other approaches.<br>

<blockquote type="cite">

<div dir="ltr">

<div><br>

</div>

<div>There were also in the past proposals to vectorise the tail loop, which could be a similar step. For example, in case the main vector body is 8-way or 16-way, the tail loop would be 7-way or 15-way, which is horribly inefficient. The idea was to further

 vectorise the 7-way as 4+2+1 ways, same for 15. If those loops are then unrolled, you end up with a nice decaling down pattern. On scalable vectors, this becomes a noop.</div>

<div><br>

</div>

<div>There is a separate thread for vectorisation cost model [1] which talks about some of the challenges there, I think we need to include scalable vectors in consideration when thinking about it.</div>

</div>

</blockquote>

Agreed. It would be very useful to think about a scalable vectors aware cost-model right from the beginning now that there is effort already underway to integrate it into VPlan. There was also a discussion around it in the latest SVE/SVE2 sync-up meeting and

 I think almost everyone was in agreement.<br>

<blockquote type="cite">

<div dir="ltr">

<div><br>

</div>

<div>The NEON vs RISCV register shadowing is interesting. It is true we mostly ignored 64-bit vectors in the vectoriser, but LLVM can still generate them with the (SLP) region vectoriser. IIRC, support for that kind of aliasing is not trivial (and why GCC's

 description of NEON registers sucked for so long), but the motivation of register pressure inside hot loops is indeed important. I'm adding Arai Masaki in CC as this is something he was working on.</div>

</div>

</blockquote>

<p>Thanks for adding Arai! I will be happy to pick their brain on the the topic.<br>

</p>

<p>One specific place where we have to deal with it is when computing a feasible max VF. I am currently experimenting with an approach to have user specify (via a command line flag) a vector register width multiplier - a factor by which the operating vector

 register width would be the multiple of the minimum vector register width and then based on that, estimate the highest VF that won't spill registers (relies on TTI for information about the number of registers in relation to register width). This is definitely

 not a generic solution and probably not elegant either but personally it serves as a starting point to think about the broader issue.<br>

</p>

<blockquote type="cite">

<div dir="ltr">

<div><br>

</div>

<div>Otherwise, I think working with the current folks on VPlan and scalable extensions will be a good way to upstreaming all the ideas you guys had in your work.</div>

</div>

</blockquote>

That's the plan!<br>

<blockquote type="cite">

<div dir="ltr">

<div><br>

</div>

<div>Thanks!</div>

<div>--renato</div>

<div><br>

</div>

<div>[1] <a href="http://lists.llvm.org/pipermail/llvm-dev/2020-October/146236.html" target="_blank">http://lists.llvm.org/pipermail/llvm-dev/2020-October/146236.html</a></div>

<div><br>

</div>

<div><br>

</div>

</div>

<br>

<div>

<div dir="ltr">On Mon, 2 Nov 2020 at 15:52, Vineet Kumar via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br>

</div>

<blockquote style="margin:0px 0px 0px

                    0.8ex; border-left:1px solid

                    rgb(204,204,204); padding-left:1ex">

<div>

<p>Hi all, </p>

<div>

<p>At the Barcelona Supercomputing Center, we have been working on an end-to-end vectorizer using scalable vectors for RISC-V Vector extension in context of the

<a href="https://www.european-processor-initiative.eu/accelerator/" target="_blank">

EPI Project</a>. We earlier shared a demo of our prototype implementation  (<a href="https://repo.hca.bsc.es/epic/z/9eYRIF" target="_blank">https://repo.hca.bsc.es/epic/z/9eYRIF</a>, see below) with the folks involved with LLVM SVE/SVE2 development. Since there

 was an interest in looking at the source code during the discussions in the subsequent LLVM SVE/SVE2 sync-up meetings, we are also publishing a public copy of our repository.

<br>

</p>

<p>It is available at <a href="https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi" target="_blank">

https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi</a> and will sync with our ongoing development on a weekly basis. Note that this is very much a work in progress and the code in this repository is only for reference purpose. Please see the

<a href="https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/-/blob/EPI/README.md" target="_blank">

README</a> file in the repo for details on our approach, design decisions, and limitations.</p>

<p>We welcome any questions and feedback. <br>

</p>

</div>

<div><br>

<pre>Thanks and Regards,

Vineet Kumar - <a href="mailto:vineet.kumar@bsc.es" target="_blank">vineet.kumar@bsc.es</a>

Barcelona Supercomputing Center - Centro Nacional de Supercomputación

</pre>

<div>On 2020-07-29 3:10 a.m., Vineet Kumar wrote:<br>

</div>

<blockquote type="cite">

<pre>Hi all,

Following up on the discussion in the last meeting about auto-

vectorization for RISC-V Vector extension (scalable vectors) at the

Barcelona Supercomputing Center, here are some additional details. 

We have a working prototype for end-to-end compilation targeting the

RISC-V Vector extension. The auto-vectorizer supports two strategies to

generate LLVM IR using scalable vectors:

1) Generate a vector loop using VF (vscale x k) = whole vector register

width, followed by a scalar tail loop.

2) Generate only a vector loop with active vector length controlled by

the RISC-V `vsetvli` instruction and using Vector Predicated intrinsics

(<a href="https://reviews.llvm.org/D57504" target="_blank">https://reviews.llvm.org/D57504</a>). (Of course, intrinsics come with

their own limitations but we feel it serves as a good proof of concept

for our use case.) We also extend the VPlan to generate VPInstructions

that are expanded using predicated intrinsics.

We also considered a third hybrid approach of having a vector loop with

VF = whole register width, followed by a vector tail loop using

predicated intrinsics. For now though, based on project requirements,

we favoured the second approach.

We have also taken care to not break any fixed-vector implementation.

All the scalable vector IR gen is guarded by conditions set by TTI. 

For shuffles, the most used case is broadcast which is supported by the

current semantics of `shufflevector` instruction. For other cases like

reverse, concat, etc., we have defined our own intrinsics.

Current limitaitons:

The cost model for scalable vectors doesn't do much other than always

decideing to vectorize with VF based on TargetWidestType/SmallestType.

We also do not support interleaving yet.

Demo:

The current implementation is very much in alpha and eventually, once

it's more polished and thoroughly verified, we will put out patches on

Phabricator. Till then, we have set up a Compiler Explorer server

against our development branch to showcase the generated code.

You can see and experiment with the generated LLVM IR and VPlan for a

set of examples, with predicated vector loop (`-mprefer-predicate-over-

epilog`) at <a href="https://repo.hca.bsc.es/epic/z/JB4ZoJ" target="_blank">https://repo.hca.bsc.es/epic/z/JB4ZoJ</a>

and with a scalar epilog (`-mno-prefer-predicate-over-epilog`) at 

<a href="https://repo.hca.bsc.es/epic/z/0WoDGt" target="_blank">https://repo.hca.bsc.es/epic/z/0WoDGt</a>. 

Note that you can remove the `-emit-llvm` option to see the generated

RISC-V assembly. 

We welcome any questions and feedback.

Thanks and Regards,

Vineet Kumar - <a href="mailto:vineet.kumar@bsc.es" target="_blank">vineet.kumar@bsc.es</a>

Barcelona Supercomputing Center - Centro Nacional de Supercomputación

</pre>

</blockquote>

</div>

<br>

<br>

WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the

 intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please

 notify the sender and destroy and delete any copies you may have received. <br>

<br>

<a href="http://www.bsc.es/disclaimer" target="_blank">http://www.bsc.es/disclaimer</a>

<br>

</div>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

</blockquote>

</div>

</blockquote>

<br>

<br>

WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the

 intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please

 notify the sender and destroy and delete any copies you may have received. <br>

<br>

<a href="http://www.bsc.es/disclaimer" target="_blank">http://www.bsc.es/disclaimer</a>

<br>

</div>

</div>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

</blockquote>

</div>

<br clear="all">

<br>

-- <br>

<div dir="ltr" class="x_gmail_signature">Roger Ferrer Ibáñez</div>

</blockquote>

<br>

</div>

</body>

</html>