[llvm-dev] Loop-vectorizer prototype for the EPI Project based on the RISC-V Vector Extension (Scalable vectors)
Vineet Kumar via llvm-dev
llvm-dev at lists.llvm.org
Mon Nov 2 07:52:36 PST 2020
At the Barcelona Supercomputing Center, we have been working on an
end-to-end vectorizer using scalable vectors for RISC-V Vector extension
in context of the EPI Project
<https://www.european-processor-initiative.eu/accelerator/>. We earlier
shared a demo of our prototype implementation
(https://repo.hca.bsc.es/epic/z/9eYRIF, see below) with the folks
involved with LLVM SVE/SVE2 development. Since there was an interest in
looking at the source code during the discussions in the subsequent LLVM
SVE/SVE2 sync-up meetings, we are also publishing a public copy of our
It is available at https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi and
will sync with our ongoing development on a weekly basis. Note that this
is very much a work in progress and the code in this repository is only
for reference purpose. Please see the README
file in the repo for details on our approach, design decisions, and
We welcome any questions and feedback.
Thanks and Regards,
Vineet Kumar -vineet.kumar at bsc.es
Barcelona Supercomputing Center - Centro Nacional de Supercomputación
On 2020-07-29 3:10 a.m., Vineet Kumar wrote:
> Hi all,
> Following up on the discussion in the last meeting about auto-
> vectorization for RISC-V Vector extension (scalable vectors) at the
> Barcelona Supercomputing Center, here are some additional details.
> We have a working prototype for end-to-end compilation targeting the
> RISC-V Vector extension. The auto-vectorizer supports two strategies to
> generate LLVM IR using scalable vectors:
> 1) Generate a vector loop using VF (vscale x k) = whole vector register
> width, followed by a scalar tail loop.
> 2) Generate only a vector loop with active vector length controlled by
> the RISC-V `vsetvli` instruction and using Vector Predicated intrinsics
> (https://reviews.llvm.org/D57504). (Of course, intrinsics come with
> their own limitations but we feel it serves as a good proof of concept
> for our use case.) We also extend the VPlan to generate VPInstructions
> that are expanded using predicated intrinsics.
> We also considered a third hybrid approach of having a vector loop with
> VF = whole register width, followed by a vector tail loop using
> predicated intrinsics. For now though, based on project requirements,
> we favoured the second approach.
> We have also taken care to not break any fixed-vector implementation.
> All the scalable vector IR gen is guarded by conditions set by TTI.
> For shuffles, the most used case is broadcast which is supported by the
> current semantics of `shufflevector` instruction. For other cases like
> reverse, concat, etc., we have defined our own intrinsics.
> Current limitaitons:
> The cost model for scalable vectors doesn't do much other than always
> decideing to vectorize with VF based on TargetWidestType/SmallestType.
> We also do not support interleaving yet.
> The current implementation is very much in alpha and eventually, once
> it's more polished and thoroughly verified, we will put out patches on
> Phabricator. Till then, we have set up a Compiler Explorer server
> against our development branch to showcase the generated code.
> You can see and experiment with the generated LLVM IR and VPlan for a
> set of examples, with predicated vector loop (`-mprefer-predicate-over-
> epilog`) athttps://repo.hca.bsc.es/epic/z/JB4ZoJ
> and with a scalar epilog (`-mno-prefer-predicate-over-epilog`) at
> Note that you can remove the `-emit-llvm` option to see the generated
> RISC-V assembly.
> We welcome any questions and feedback.
> Thanks and Regards,
> Vineet Kumar -vineet.kumar at bsc.es
> Barcelona Supercomputing Center - Centro Nacional de Supercomputación
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev