[llvm-dev] [RFC] Permit load/store/alloca for struct containing all scalable vectors.

Thu Mar 18 20:41:50 PDT 2021

Hi all,

We have a proposal to support load/store/alloca for struct containing all
scalable vectors. Please help us to review it and give us your suggestions.

Thanks a lot.

Introduction

=======

In RISC-V V-extension, we have a sub-extension, Zvlsseg, that could move
multiple contiguous fields in memory to and from consecutively numbered
vector registers. In our intrinsic document[1], we define a set of types
for these segment load/store intrinsics. We have two additional parameters
attached on the Zvlsseg types, the number of fields(NF) and LMUL[2]. NF
could be 2 to 8 and LMUL could be 1/8, 1/4, 1/2, 1, 2, 4, 8.

We have tried to use primitive builtin types to model Zvlsseg types. That
is, we have <vscale x 2 x i32> for LMUL = 1, int32 vector type. We use
<vscale x 4 x i32> for NF = 2, LMUL = 1, int32 Zvlsseg type. However,
<vscale x 4 x i32> is also a legal type for LMUL = 2, int32 vector type.
They are both legal types. There is no way to distinguish them in the type
legalizer.

To address the issue, we use the struct type to model Zvlsseg types in our
downstream version. We use {<vscale x 2 x i32>, <vscale x 2 x i32>} for NF
= 2, LMUL = 1, int32 Zvlsseg type. There is no ambiguous between these
scalable vector types for RISC-V V-extension. However, we have to support
load/store/alloca for scalable struct to model Zvlsseg types in this way.

[1].
https://github.com/riscv/rvv-intrinsic-doc/blob/master/intrinsic_funcs/03_vector_load_store_segment_instructions_zvlsseg.md

[2]. The vector length multiplier, *LMUL*, when greater than 1, represents
the default number of vector registers that are combined to form a vector
register group.

Implementation

=======

In the current StructLayout implementation, it uses uint64_t to represent
the size of struct and offsets of struct members. We use TypeSize for the
size of struct and StackOffset for the offsets of elements. In this way, we
could record the correct information in the StructLayout when it contains
scalable elements. However, TypeSize is a one-dimension polynomial type. To
minimize the impact to the current implementation and to fit our
requirements, we only permit load/store/alloca all scalable types in a
struct or all fixed length types in a struct. That is, TypeSize is either
scalable size or fixed size.

Impact on other passes

=======

I have reviewed all uses of StructLayout. A large part of uses are related
to ConstantStruct. There should be no use cases for scalable
ConstantStruct. Another large part of uses are related to
GetElementPtrInst. We only need to support load/store/alloca to fit our
requirements. We prefer not to support getelementptr for scalable struct.
We could add an assertion in the constructor of GetElementPtrInst to
inhibit struct containing scalable vectors. It is a manageable work to
change the internal representation of StructLayout.

How to avoid using getelementptr for scalable struct

=======

We could avoid using getelementptr by using insertvalue/extractvalue then
load/store the whole structure. For example, instead of

%0 = getelementptr %struct.type, %struct.type* %val, i32 0, i32 0

store <vscale x 2 x i32> %v.coerce0, <vscale x 2 x i32>* %0

%1 = getelementptr %struct.type, %struct.type* %val, i32 0, i32 1

store <vscale x 2 x i32> %v.coerce1, <vscale x 2 x i32>* %1

We could use

%0 = insertvalue %struct.type undef, <vscale x 2 x i32> %v.coerce0, 0

%1 = insertvalue %struct.type %0, <vscale x 2 x i32> %v.coerce1, 1

store %struct.type %1, %struct.type* %val

to avoid using getelementptr for scalable struct.

How to deal with multiple returns with scalable vectors and fixed length
objects?

=======

In D94142, it permits to put scalable vectors and fixed length objects in
struct as multiple return values of intrinsic calls, but inhibits
load/store/alloca for them. In this proposal, we still inhibit
load/store/alloca for these struct. How do we deal with it when the return
values are struct with scalable vectors and fixed length objects?

We extract the values into the struct with all scalable vectors and extract
scalar values as needed.

For example,

%struct.type = type { <vscale x 2 x i32>, <vscale x 2 x i32> }

%3 = call { <vscale x 2 x i32>, <vscale x 2 x i32>, i64 }
@llvm.riscv.test(i32* %0)

%4 = extractvalue { <vscale x 2 x i32>, <vscale x 2 x i32>, i64 } %3, 0

%5 = insertvalue %struct.type undef, <vscale x 2 x i32> %4, 0

%6 = extractvalue { <vscale x 2 x i32>, <vscale x 2 x i32>, i64 } %3, 1

%7 = insertvalue %struct.type %5, <vscale x 2 x i32> %6, 1

%8 = extractvalue { <vscale x 2 x i32>, <vscale x 2 x i32>, i64 } %3, 2

store i64 %8, i64* %1, align 8

ret %struct.type %7

Related patches

=======

[NFC][IR] Replace isa<ScalableVectorType> with a predicator function.

https://reviews.llvm.org/D98161

[PoC][IR] Permit load/store/alloca for struct with the same scalable
vectors.

https://reviews.llvm.org/D98169

- Kai
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210319/69d7555b/attachment-0001.html>