[llvm-dev] [RFC] VP intrinsics support for the Loop Vectorizer

Thu Apr 1 15:50:19 PDT 2021

Hi All,

As the work on Vector Predication intrinisics 
(https://reviews.llvm.org/D57504 <https://reviews.llvm.org/D57504>, 
https://reviews.llvm.org/project/profile/87/ 
<https://reviews.llvm.org/project/profile/87/>) continues to progress 
with significant parts already in upstream, this RFC proposes using them 
as a target for the Loop Vectorizer.

We have put up a Proof-of-Concept patch on Phabricator:

https://reviews.llvm.org/D99750 <https://reviews.llvm.org/D99750> (/[LV, 
VP] RFC: VP intrinsics support for the Loop Vectorizer (Proof-of-Concept)/)

*Please see the patch summary for more technical details, alternative 
strategies, limitations, and tentative development roadmap.*

This patch contains a prototype implementation that demonstrates Loop 
Vectorizer generating VP intrinisics for simple integer operations on 
fixed vectors.

SIMD ISAs such as RISC-V V-extension, NEC SX-Aurora and Power VSX with 
active vector length predication support can specially benefit from this 
since currently there is no other reasonable way in the LLVM IR to model 
active vector length in the vector instructions.

ISAs such as AVX512 and ARM SVE with masked vector predication support 
could benefit by being able to use predicated operations other than just 
the memory operations (via masked load/store/gather/scatter intrinsics).

The approach in this patch builds on top of the existing tail-folding 
mechanism, but instead of generating masked memory intrinsics, it 
generate VP intrinsics for both memory and arithmetic operations. The 
patch also extends VPlan to add new recipes for `PREDICATED-WIDENING` to 
VP intrinsics; This will eventually help to build and compare VPlans for 
different strategies.

The patch also demonstrates different ways to compute the vector length 
parameter (EVL) for the VP intrinsics. Base idea is to compute `min(VF, 
trip_count - index)` for each vector iteration. For targets with no 
vector length predication, `VF` can be used as EVL and for targets with 
custom instructions, an experimental intrinsic is proposed.

The patch is only meant to be a proof-of concept and intentionally 
limits itself to support only very simple cases with only integer 
operations, no control flow, no interleaving and other restrictions. It 
also uses a command line switch to force VP intrinsic support and needs 
tail-folding explicitly enabled.

Best,

Vineet Kumar -vineet.kumar at bsc.es
Barcelona Supercomputing Center - Centro Nacional de Supercomputación

http://bsc.es/disclaimer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210402/bde5fef0/attachment.html>