<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-text-html" lang="x-unicode">
<p>Hi All,</p>
<p>As the work on Vector Predication intrinisics (<a
href="https://reviews.llvm.org/D57504" moz-do-not-send="true">https://reviews.llvm.org/D57504</a>,
<a moz-do-not-send="true"
href="https://reviews.llvm.org/project/profile/87/">https://reviews.llvm.org/project/profile/87/</a>)
continues to progress with significant parts already in
upstream, this RFC proposes using them as a target for the Loop
Vectorizer.</p>
<p>We have put up a Proof-of-Concept patch on Phabricator:<br>
</p>
<p><a href="https://reviews.llvm.org/D99750"
moz-do-not-send="true">https://reviews.llvm.org/D99750</a> (<i>[LV,
VP] RFC: VP intrinsics support for the Loop Vectorizer
(Proof-of-Concept)</i>)</p>
<p>*Please see the patch summary for more technical details,
alternative strategies, limitations, and tentative development
roadmap.*<br>
</p>
<p>This patch contains a prototype implementation that
demonstrates Loop Vectorizer generating VP intrinisics for
simple integer operations on fixed vectors.</p>
<p>SIMD ISAs such as RISC-V V-extension, NEC SX-Aurora and Power
VSX with active vector length predication support can specially
benefit from this since currently there is no other reasonable
way in the LLVM IR to model active vector length in the vector
instructions.</p>
<p>ISAs such as AVX512 and ARM SVE with masked vector predication
support could benefit by being able to use predicated operations
other than just the memory operations (via masked
load/store/gather/scatter intrinsics).</p>
<p>The approach in this patch builds on top of the existing
tail-folding mechanism, but instead of generating masked memory
intrinsics, it generate VP intrinsics for both memory and
arithmetic operations. The patch also extends VPlan to add new
recipes for `<font face="monospace">PREDICATED-WIDENING</font>`
to VP intrinsics; This will eventually help to build and compare
VPlans for different strategies.</p>
<p>The patch also demonstrates different ways to compute the
vector length parameter (EVL) for the VP intrinsics. Base idea
is to compute `<font face="monospace">min(VF, trip_count -
index)</font>` for each vector iteration. For targets with no
vector length predication, `<font face="monospace">VF</font>`
can be used as EVL and for targets with custom instructions, an
experimental intrinsic is proposed.</p>
<p>The patch is only meant to be a proof-of concept and
intentionally limits itself to support only very simple cases
with only integer operations, no control flow, no interleaving
and other restrictions. It also uses a command line switch to
force VP intrinsic support and needs tail-folding explicitly
enabled.</p>
<p>Best, </p>
<pre class="moz-quote-pre" wrap="">Vineet Kumar - <a class="moz-txt-link-abbreviated" href="mailto:vineet.kumar@bsc.es">vineet.kumar@bsc.es</a>
Barcelona Supercomputing Center - Centro Nacional de Supercomputación</pre>
</div>
<br>
<br>
WARNING / LEGAL TEXT: This message is intended only for the use of the
individual or entity to which it is addressed and may contain
information which is privileged, confidential, proprietary, or exempt
from disclosure under applicable law. If you are not the intended
recipient or the person responsible for delivering the message to the
intended recipient, you are strictly prohibited from disclosing,
distributing, copying, or in any way using this message. If you have
received this communication in error, please notify the sender and
destroy and delete any copies you may have received.
<br><br>
<a href="http://www.bsc.es/disclaimer">http://www.bsc.es/disclaimer</a>
<br>
</body>
</html>