[PATCH] D98169: [IR] Permit load/store/alloca for struct with the same scalable vectors.

Fri Mar 12 09:43:39 PST 2021

On Fri, Mar 12, 2021 at 6:51 AM Sander de Smalen via Phabricator <
reviews at reviews.llvm.org> wrote:

> sdesmalen added a comment.
>
> In D98169#2619874 <https://reviews.llvm.org/D98169#2619874>,
> @craig.topper wrote:
>
> > We want this to support the segment load/store intrinsics defined here
> https://github.com/riscv/rvv-intrinsic-doc/blob/master/intrinsic_funcs/03_vector_load_store_segment_instructions_zvlsseg.md
> These return 2 to 8 vectors that have been loaded into consecutive
> registers. I believe SVE has similar instructions. I believe SVE represents
> these using types wider than their normal scalable vector types and relies
> on the type legalizer to split them up in the backend. This works for SVE
> because there is only one known minimum size for all scalable vector types
> so the type legalizer will always split down to that minimum type.
>
> Thanks for providing the context!
>
> > For RISC-V vectors we already use 7 different sizes of scalable vectors
> to represent the ability of our instructions to operate on 2, 4, or 8
> registers simultaneously. And for 1/2, 1/4, and 1/8 fractional registers.
> The segment load/store instructions add an extra dimension where they can
> produce/consume 2, 3, or 4 pairs of registers or 2 quadruples, for
> examples. Following the SVE strategy would give us ambiguous types for the
> type legalizer.
>
> How does that look in terms of IR? Is the number of registers somehow
> represented in the (LLVM IR) vector type? Or are the types the same, but
> the compiler generates different code depending on what mode is set? For
> SVE we know we can split the vector because <vscale x 8 x i32> is twice the
> size of <vscale x 4 x i32>, regardless of the value for vscale. Indeed we
> know SVE vectors area multiple of 128bits, and therefore that <vscale x 4 x
> i32> is legal. In order to make any assumptions about
> splitting/legalization, the compiler will need to know which types are
> legal, and so would expect the compiler to know the mode (2, 4 ,8) for RVV
> when generating the code, and therefore have similar knowledge about which
> types are legal and how the vectors are represented/split into registers.
> How does that lead to ambiguous types?
>

The mode can be freely changed at any time by emitting a vsetvli
instruction. Some instructions like zext/sext can take an input in 1
register and output in 2. Or input in 2 registers and output on 4. The
output automatically uses an LMUL and element width twice the input. The
mode for subsequent instructions would need to be changed to operate on
this widened data. To represent these different modes we're using 7
different known minimum sized scalable types from 8 bits up to 512 bits.
LMUL=1/8 uses <vscale x 1 x i8>, LMUL=1/4 uses <vscale x 2 x i8>, <vscale x
1 x i16>, and <vscale x 1 x half>, LMUL=1/2 uses <vscale x 4 x i8, 2 x
i16>, <vscale x 1 x i32>, <vscale x 2 x half>, and <vscale x 1 x float>.
LMUL=1 uses <vscale x 8 xi8>, <vscale x 4 xi16>, <vscale x 2 x i32>,
<vscale x 1 x i64>, <vscale x 4 x half>, <vscale x 2 x float>, <vscale x 1
x double>, etc. All together there are 22 legal types. For each instruction
we look at the mode it needs for its input and output types and emit a
vsetvli instruction immediately before. A later MIR pass goes through and
removes redundant vsetvli instructions created for adjacent instructions.

The segment load/store instructions operate on groups of these 22 types
with the caveat that the total size cannot exceed 1/4 of the 32 entry
register file. So there are no segments load/stores for LMUL=8. For LMUL=4
you can only use a 2x segment load. If were to use scalable types to
represent segment load/store results as well then <vscale x 4 x i32> could
either be an LMUL2 register or it could be a x2 segment load of 2 <vscale x
2 x i32> values or a x4 segment load of 4 <vscale x 1 x i32> values, etc.
Since >vscale x 4 x i32> is a legal type it would never be split. Within
segment loads <vscale x 6 x i32> could either be 6 <vscale x 1 x i32>
values or 3 <vscale x 2 x i32> values.

>
> > To solve this we would like to use a struct for the segment load/stores
> to separate them in IR. Since clang needs an address for every variable and
> needs to be able to load/store them we need to support load/store/alloca.
>
> These (C/C++-level) intrinsics are probably implemented using
> target-specific intrinsics or perhaps a common LLVM IR intrinsic like
> masked.load, which should be able to take/return a struct with scalable
> members after D94142 <https://reviews.llvm.org/D94142>. If so, it should
> be possible to handle this in Clang by emitting `extractvalue` instructions
> and storing each member individually. That would avoid any changes to LLVM
> IR. Is that something you've considered?
>

They're using target specific intrinsics which produce an aggregate after
D94142. We've been having some internal conversations about doing something
like this for a masked load of 8 registers.

int data[32] = {0};
vint8mf8_t a0 = vundefined_i8mf8();
vint8mf8_t a1 = vundefined_i8mf8();
vint8mf8_t a2 = vundefined_i8mf8();
vint8mf8_t a3 = vundefined_i8mf8();
vint8mf8_t a4 = vundefined_i8mf8();
vint8mf8_t a5 = vundefined_i8mf8();
vint8mf8_t a6 = vundefined_i8mf8();
vint8mf8_t a7 = vundefined_i8mf8();
vlseg8e8_v_i8mf8x8_m(&a0, &a1, &a2, &a3, &a4, &a5, &a6, &a7, data, 4);

instead of using a x8 struct and vget, vset, vcreate. The main disadvantage
pointed out so far is that the user could pass null or pointer cast from
another type.

>
> If we do need to make this work for scalable vectors, I think it needs a
> message to the mailing list because it's a change to the LangRef and
> capabilities of scalable vectors, given previous discussions on this topic.
> I'd like to avoid giving the impression that we're quietly moving the
> goalpost on what scalable vectors can do in IR.
>

Agreed.

>
>
> Repository:
>   rG LLVM Github Monorepo
>
> CHANGES SINCE LAST ACTION
>   https://reviews.llvm.org/D98169/new/
>
> https://reviews.llvm.org/D98169
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210312/1ad923aa/attachment.html>