[llvm-dev] [RFC] VP intrinsics support for the Loop Vectorizer
Vineet Kumar via llvm-dev
llvm-dev at lists.llvm.org
Thu Apr 1 15:50:19 PDT 2021
As the work on Vector Predication intrinisics
<https://reviews.llvm.org/project/profile/87/>) continues to progress
with significant parts already in upstream, this RFC proposes using them
as a target for the Loop Vectorizer.
We have put up a Proof-of-Concept patch on Phabricator:
https://reviews.llvm.org/D99750 <https://reviews.llvm.org/D99750> (/[LV,
VP] RFC: VP intrinsics support for the Loop Vectorizer (Proof-of-Concept)/)
*Please see the patch summary for more technical details, alternative
strategies, limitations, and tentative development roadmap.*
This patch contains a prototype implementation that demonstrates Loop
Vectorizer generating VP intrinisics for simple integer operations on
SIMD ISAs such as RISC-V V-extension, NEC SX-Aurora and Power VSX with
active vector length predication support can specially benefit from this
since currently there is no other reasonable way in the LLVM IR to model
active vector length in the vector instructions.
ISAs such as AVX512 and ARM SVE with masked vector predication support
could benefit by being able to use predicated operations other than just
the memory operations (via masked load/store/gather/scatter intrinsics).
The approach in this patch builds on top of the existing tail-folding
mechanism, but instead of generating masked memory intrinsics, it
generate VP intrinsics for both memory and arithmetic operations. The
patch also extends VPlan to add new recipes for `PREDICATED-WIDENING` to
VP intrinsics; This will eventually help to build and compare VPlans for
The patch also demonstrates different ways to compute the vector length
parameter (EVL) for the VP intrinsics. Base idea is to compute `min(VF,
trip_count - index)` for each vector iteration. For targets with no
vector length predication, `VF` can be used as EVL and for targets with
custom instructions, an experimental intrinsic is proposed.
The patch is only meant to be a proof-of concept and intentionally
limits itself to support only very simple cases with only integer
operations, no control flow, no interleaving and other restrictions. It
also uses a command line switch to force VP intrinsic support and needs
tail-folding explicitly enabled.
Vineet Kumar -vineet.kumar at bsc.es
Barcelona Supercomputing Center - Centro Nacional de Supercomputación
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev