[llvm] [Docs][RISCV] Document RISC-V vector codegen (PR #96740)
Fraser Cormack via llvm-commits
llvm-commits at lists.llvm.org
Wed Jun 26 09:25:48 PDT 2024
================
@@ -0,0 +1,289 @@
+=========================
+ RISC-V Vector Extension
+=========================
+
+.. contents::
+ :local:
+
+The RISC-V target readily supports the 1.0 version of the `RISC-V Vector Extension (RVV) <https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc>`_, but requires some tricks to handle its unique design.
+This guide gives an overview of how RVV is modelled in LLVM IR and how the backend generates code for it.
+
+Mapping to LLVM IR types
+========================
+
+RVV adds 32 ``VLEN`` sized registers, where ``VLEN`` is an unknown constant to the compiler. To be able to represent ``VLEN`` sized values, the RISC-V backend takes the same approach as AArch64's SVE and uses `scalable vector types <https://llvm.org/docs/LangRef.html#t-vector>`_.
+
+Scalable vector types are of the form ``<vscale x n x ty>``, which indicate a vector with a multiple of ``n`` elements of type ``ty``. ``n`` and ``ty`` then end up controlling LMUL and SEW respectively.
+
+LLVM supports only ``ELEN=32`` or ``ELEN=64``, so ``vscale`` is defined as ``VLEN/64`` (see ``RISCV::RVVBitsPerBlock``).
+This makes the LLVM IR types stable between the two ``ELEN`` s considered, i.e., every LLVM IR scalable vector type has exactly one corresponding pair of element type and LMUL, and vice-versa.
+
++-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+
+| | LMUL=⅛ | LMUL=¼ | LMUL=½ | LMUL=1 | LMUL=2 | LMUL=4 | LMUL=8 |
++===================+===============+================+==================+===================+===================+===================+===================+
+| i64 (ELEN=64) | N/A | N/A | N/A | <v x 1 x i64> | <v x 2 x i64> | <v x 4 x i64> | <v x 8 x i64> |
++-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+
+| i32 | N/A | N/A | <v x 1 x i32> | <v x 2 x i32> | <v x 4 x i32> | <v x 8 x i32> | <v x 16 x i32> |
++-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+
+| i16 | N/A | <v x 1 x i16> | <v x 2 x i16> | <v x 4 x i16> | <v x 8 x i16> | <v x 16 x i16> | <v x 32 x i16> |
++-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+
+| i8 | <v x 1 x i8> | <v x 2 x i8> | <v x 4 x i8> | <v x 8 x i8> | <v x 16 x i8> | <v x 32 x i8> | <v x 64 x i8> |
++-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+
+| double (ELEN=64) | N/A | N/A | N/A | <v x 1 x double> | <v x 2 x double> | <v x 4 x double> | <v x 8 x double> |
++-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+
+| float | N/A | N/A | <v x 1 x float> | <v x 2 x float> | <v x 4 x float> | <v x 8 x float> | <v x 16 x float> |
++-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+
+| half | N/A | <v x 1 x half> | <v x 2 x half> | <v x 4 x half> | <v x 8 x half> | <v x 16 x half> | <v x 32 x half> |
++-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+
+
+(Read ``<v x k x ty>`` as ``<vscale x k x ty>``)
+
+
+Mask vector types
+-----------------
+
+As for mask vectors, they are physically represented using a layout of densely packed bits in a vector register.
+They are mapped to the following LLVM IR types:
+
+- <vscale x 1 x i1>
+- <vscale x 2 x i1>
+- <vscale x 4 x i1>
+- <vscale x 8 x i1>
+- <vscale x 16 x i1>
+- <vscale x 32 x i1>
+- <vscale x 64 x i1>
+
+Two types with the same ratio SEW/LMUL will have the same related mask type. For instance, two different comparisons one under SEW=64, LMUL=2 and the other under SEW=32, LMUL=1 will both generate a mask <vscale x 2 x i1>.
+
+Representation in LLVM IR
+=========================
+
+Vector instructions can be represented in three main ways in LLVM IR:
+
+1. Regular instructions on both fixed and scalable vector types
+
+ .. code-block:: llvm
+
+ %c = add <vscale x 4 x i32> %a, %b
+
+2. RISC-V vector intrinsics, which mirror the `C intrinsics specification <https://github.com/riscv-non-isa/rvv-intrinsic-doc>`_
+
+ These come in unmasked variants:
+
+ .. code-block:: llvm
+
+ %c = call @llvm.riscv.vadd.nxv4i32.nxv4i32(
+ <vscale x 4 x i32> %passthru,
+ <vscale x 4 x i32> %a,
+ <vscale x 4 x i32> %b,
+ i64 %avl
+ )
+
+ As well as masked variants:
+
+ .. code-block:: llvm
+
+ %c = call @llvm.riscv.vadd.mask.nxv4i32.nxv4i32(
+ <vscale x 4 x i32> %passthru,
+ <vscale x 4 x i32> %a,
+ <vscale x 4 x i32> %b,
+ i64 %avl
+ )
+
+ Both allow setting the AVL as well as controlling the inactive/tail elements via the passthru operand, but the masked variant also provides operands for the mask and ``vta``/``vma`` policy bits.
+
+ The only valid types are scalable vector types.
+
+3. :doc:`Vector predication (VP) intrinsics </Proposals/VectorPredication>`
+
+ .. code-block:: llvm
+
+ %c = call @llvm.vp.add.nxv4i32(
+ <vscale x 4 x i32> %a,
+ <vscale x 4 x i32> %b,
+ <vscale x 4 x i1> %m
+ i32 %evl
+ )
+
+ Unlike RISC-V intrinsics, VP intrinsics are target agnostic so they can be emitted from other optimisation passes in the middle-end (like the loop vectorizer). They also support fixed length vector types.
+
+SelectionDAG lowering
+=====================
+
+For most regular **scalable** vector LLVM IR instructions, their corresponding SelectionDAG nodes are legal on RISC-V and don't require any custom lowering.
+
+.. code-block::
+
+ t5: nxv4i32 = add t2, t4
+
+This is because the TableGen patterns for RVV are only defined for scalable vector types.
+
+RISC-V vector intrinsics only support scalable vector types, so they are also legal.
+
+.. code-block::
+
+ t12: nxv4i32 = llvm.riscv.vadd TargetConstant:i64<10056>, undef:nxv4i32, t2, t4, t6
+
+Fixed length vectors
+--------------------
+
+Because there are no fixed length vector patterns, fixed length vectors need to be custom lowered and performed in a scalable "container" type:
+
+1. The fixed length vector operands are inserted into scalable containers with ``insert_subvector`` nodes. The container type is chosen such that its minimum size will fit the fixed length vector (see ``getContainerForFixedLengthVector``).
+2. The operation is then performed on the container type via a **VL (vector length) node**. These are custom nodes defined in ``RISCVInstrInfoVVLPatterns.td`` that mirror target agnostic SelectionDAG nodes, as well as some RVV instructions. They contain an AVL operand, which is set to the number of elements in the fixed length vector.
+ Some nodes also have a passthru or mask operand, which will usually be set to undef and all ones when lowering fixed length vectors.
+3. The result is put back into a fixed length vector via ``extract_subvector``.
+
+.. code-block::
+
+ t2: nxv2i32,ch = CopyFromReg t0, Register:nxv2i32 %0
+ t4: v4i32 = extract_subvector t2, Constant:i64<0>
+ t6: nxv2i32,ch = CopyFromReg t0, Register:nxv2i32 %1
+ t7: v4i32 = extract_subvector t6, Constant:i64<0>
+ t8: v4i32 = add t4, t7
+
+ // custom lowered to:
+
+ t2: nxv2i32,ch = CopyFromReg t0, Register:nxv2i32 %0
+ t6: nxv2i32,ch = CopyFromReg t0, Register:nxv2i32 %1
+ t15: nxv2i1 = RISCVISD::VMSET_VL Constant:i64<4>
+ t16: nxv2i32 = RISCVISD::ADD_VL t2, t6, undef:nxv2i32, t15, Constant:i64<4>
+ t17: v4i32 = extract_subvector t16, Constant:i64<0>
+
+VL nodes often have a passthru or mask operand, which are usually set to undef and all ones for fixed length vectors.
+
+The ``insert_subvector`` and ``extract_subvector`` nodes responsible for wrapping and unwrapping will get combined away, and eventually we will lower all fixed vector types to scalable. Note that fixed length vectors at the interface of a function are passed in a scalable vector container.
+
+.. note::
+
+ The only ``insert_subvector`` and ``extract_subvector`` nodes that make it through lowering are those that can be performed as an exact subregister insert or extract. This means that any fixed length vector ``insert_subvector`` and ``extract_subvector`` nodes that aren't legalized must lie on a register group boundary, so the exact ``VLEN`` must be known at compile time (i.e. compiled with ``-mrvv-vector-bits=zvl`` or ``-mllvm -riscv-v-vector-bits-max=VLEN``, or have an exact ``vscale_range`` attribute).
+
+Vector predication intrinsics
+-----------------------------
+
+VP intrinsics also get custom lowered via VL nodes.
+
+.. code-block::
+
+ t12: nxv2i32 = vp_add t2, t4, t6, Constant:i64<8>
+
+ // custom lowered to:
+
+ t18: nxv2i32 = RISCVISD::ADD_VL t2, t4, undef:nxv2i32, t6, Constant:i64<8>
+
+The VP EVL and mask are used for the VL node's AVL and mask respectively, whilst the passthru is set to undef. A passthru can be emulated to get tail/mask undisturbed behaviour by using ``@llvm.vp.merge``. It will get lowered as a ``vmerge``, but will likely be merged back into the underlying instruction's mask via ``RISCVDAGToDAGISel::performCombineVMergeAndVOps``.
----------------
frasercrmck wrote:
back-ticked `undef` here too, if desired.
https://github.com/llvm/llvm-project/pull/96740
More information about the llvm-commits
mailing list