[llvm-dev] [RFC] Named shuffle intrinsics

Tue Nov 24 09:23:45 PST 2020

Seems reasonable to me, though please make sure to place them in the 
experimental namespace to start.  They can be promoted from experimental 
once we have some practical experience with them.

Philip

On 11/24/20 7:58 AM, Joe Ellis via llvm-dev wrote:
> Hi there,
>
> For fixed-length vectors, shufflevector can represent all possible shuffles. However, shufflevector uses an ArrayRef for its mask, which cannot work for scalable vectors. Splats are an exception to this in that shufflevector can represent scalable vector splats, but they are inconsistent with all other shuffles because the result's element count is taken from the first source vector rather than the mask.
>
> shufflevector could be extended with more support for scalable types, but it is not clear what advantage this has over using explicitly named shuffles.
>
> We are proposing having named shuffle intrinsics under the llvm.vector namespace that work for both fixed-length and scalable vectors. We do not intend to introduce new types of shuffles, and it is our expectation that all intrinsics will simplify to shufflevector when operating on fixed-length vectors.
>
> We would like to start with the following named shuffle intrinsics:
>
>     llvm.vector.extract  (see below)
>     llvm.vector.insert   (see below)
>     llvm.vector.reverse  (as used by LoopVectorize)
>     llvm.vector.splice   (as used by LoopVectorize)
>
> Our immediate interest here is with the llvm.vector.insert and llvm.vector.extract intrinsics, for which there is a proof of concept on Phabricator[1]. The llvm.vector.insert and llvm.llvm.vector.extract intrinsics are directly lowered to the INSERT_SUBVECTOR and EXTRACT_SUBVECTOR ISD nodes, and have the same semantics. We plan to simplify fixed-length variants of these intrinsics to shufflevector within LLVM IR to maintain existing code paths/optimisations.
>
> We intend to use the llvm.vector.insert and llvm.vector.extract intrinsics to avoid going through memory when generating IR that translates a C/C++ bitcast from scalable vectors to fixed-width vectors, and vice-versa. As an example, see clang/test/CodeGen/attr-arm-sve-vector-bits-cast.c, which shows that these bitcasts are currently done through memory.
>
> Joe
>
> [1]: https://reviews.llvm.org/D91362
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev