[llvm-dev] Questions about vscale

Mon Apr 13 16:12:13 PDT 2020

Hi Hanna,

Thanks Hanna. I got your point.
You mean that If the type does not exist in the type system, we still need
to legalize it.
I support the following four kinds of i32 scalable vector types. I also do
not know how to reason about vscale x 1 x i32 under this type system.

          LMUL = 1           LMUL = 2            LMUL = 4            LMUL =
8
int32_t | vscale x 2 x i32 | vscale x  4 x i32 | vscale x  8 x i32 | vscale
x 16 x i32

Could we just support the types in the table on the RISC-V target? I mean
do not legalize it, and just issue error messages for vscale x 1 x i32.

In my latest reply, I do not propose fractional vscale. I propose “vscale x
n” be an integer. Under the assumption, I could not reason about vscale x 1
x i32. However, I could reason about vscale x 2 x i32 even when vscale =
1/2. We only care about the part “vscale x n” being integer.

The original problem is the type system proposed by Hanna under ELEN = 64 is

          LMUL = 1           LMUL = 2            LMUL = 4            LMUL =
8
int32_t | vscale x 2 x i32 | vscale x  4 x i32 | vscale x  8 x i32 | vscale
x 16 x i32

Under ELEN = 32 is

          LMUL = 1           LMUL = 2            LMUL = 4            LMUL =
8
int32_t | vscale x 1 x i32 | vscale x  2 x i32 | vscale x  4 x i32 | vscale
x 8 x i32

The problem is there are multiple kinds of type systems under RISC-V RVV
implementation. They are not compatible under different ELEN
configurations. AFAIK, there are no such compatible problems in GCC
implementation. (In GCC, they reason about the whole “poly_int”, instead of
“X”.)

If llvm.vscale(i32 ElementCount) is not the way we want to go, is there any
proposal to solve the compatibility problems in your type system?

On Tue, Apr 14, 2020 at 1:04 AM Hanna Kruppe <hanna.kruppe at gmail.com> wrote:

> On Tue, 7 Apr 2020 at 16:09, Renato Golin <rengolin at gmail.com> wrote:
> >
> > On Tue, 7 Apr 2020 at 12:51, Hanna Kruppe <hanna.kruppe at gmail.com>
> wrote:
> > > > 1. is LMUL always a multiple of ELEN?
> > > This happens to be true (at least in the current spec, disregarding
> > > some in-progress proposals) just because both are powers of two and
> > > the largest possible LMUL equals the smallest possible ELEN (8), but I
> > > don't think there is any meaning to be found in this observation. The
> > > two values govern unrelated aspects of the vector unit.
> >
> > Sorry, I meant multiple of basic types. But you have answered my
> question. :)
> >
> > > > 2. Is this fixed on the hardware, depending on the actual lengths, or
> > > > is this dynamically set by software (on a register or status flag)?
> > > > 2a. If dynamic, can it change from program to program? Function to
> function?
> > > It's not clear whether by "this" you mean ELEN, LMUL, or something
> > > else. ELEN is fixed in hardware. LMUL is a property of each individual
> > > instruction.
> >
> > Sorry again, "this" as in both ELEN and LMUL and their relationship. Ack.
> >
> > > I don't know what "vscale wouldn't apply" is supposed to mean.
> >
> > Legalisation-wise, you got right, like <n x 0.5 x i64> is invalid and
> > gets converted to <n x 1 x i32>, which it is.
> >
> > "Wouldn't apply" as in "what would be the point of having half-scale
> > on a type that needs to be broken in half", and thus making it whole.
> > You explain better below, so ignore it for now.
> >
> > > But how? If we take Kai's table as gospel and look at a VLEN = ELEN =
> > > 32 machine, the vector type <vscale x 2 x i32> is supposed to map to a
> > > single vector register, which is 32b small, and thus <vscale x 2 x
> > > i32> would have just one element in this context (matching the "vscale
> > > = 1/2" intuition). To be consistent with this, <vscale x 1 x i32>
> > > would have be contain just *half* an element. This is not something
> > > any legalization strategy can achieve, because it is a fundamentally
> > > impossible notion. So we end up in a situation where some types are
> > > not just illegal and have to be legalized, but are contradictory and
> > > can't be legalized in any meaningful way.
> >
> > Right, we have faced that problem before on non-scalable vector
> extensions.
> >
> > For example, vectorising 3 operations in a 4-wide vector and adding an
> > undef in the last lane.
> >
> > It didn't use to be possible to do that, many years ago, as a general
> > case. But if you look at register aliasing (VFP and NEON in ARMv7), we
> > had the idea of different number of elements on the same register,
> > depending on how you look.
> >
> > I'm not proposing to create all combinations of half-vscale shadowing,
> > but perhaps adding half-length types as valid and lowering them in a
> > special way could work much simpler than changing the interpretation
> > of vscale.
>
> [re-sending because I dropped the list -- sorry for the extra copy,
> Renato!]
>
> I don't see how the situation you mention is comparable. Legalization
> for e.g. <3 x i32> was not implemented at first, but as demonstrated
> by the fact that it *was* implemented later, there's no conceptual
> problem with legalizing that kind of type. You don't even have to
> legalize them in vector registers, three scalar registers work fine
> (you can even do that on the IR level).
>
> For <vscale x 1 x i32> with a fractional value of vscale, there are
> several conceivable ways to "legalize" this type, but none of them
> work. Legalization (codegen in general) does not know if the machine
> code will eventually run on a chip with vector registers so small that
> vscale works out to 1/2, but it has to choose some legalization
> strategy. I can imagine several approaches to this, but since the
> actual value of vscale is not known at this time, it will have to map
> the illegal scalable vector types to the vector registers in some way,
> to ensure there's enough space even when vscale is very large in some
> executions of the program.
>
> Depending on how you do that exactly, the generated code might have
> different behavior when running on a vscale == 1/2 machine, e.g. you
> might end up with a vector register holding *one* i32 element or a
> vector register holding *zero* i32 elements (i.e., the sole lane of
> the 32-bit vector register is masked out). There might be other
> approaches that result in yet another behavior, such as a hardware
> fault, but crashes and other immediate problems aside, you're going to
> end up with a certain discrete number of i32 values. That's a problem.
> If <vscale x 1 x i32> ends up having one element, and <vscale x 2 x
> i32> also has one (= 2 * 0.5) element, then that's wrong: the latter
> type must have twice as many elements as the former (one example where
> this matters: split_low / split_high / concat shuffle patterns). The
> second option, a vector with *zero* elements, is just as wrong if not
> worse.
>
> It's not that a correct legalization exists but it's too annoying to
> implement, or that one might exist but I'm too lazy to work it out.
> We're also not running in a limitation or oddity of the RISC-V vector
> ISA in particular. It's simply that, if you set vscale == 0.5, then by
> the way scalable vector types work (vscale * const elements), some
> vector types that can be written in the IR would need to have a
> fractional number of elements to be consistent with the other scalable
> vector types. As that is not possible (not even conceptually),
> whatever code you emit to try to legalize that type will end up being
> wrong in some respect.
>
> So if we'd decide to support fractional vscale, we can't say these
> types are "illegal". In LLVM parlance, illegal types can be used in
> LLVM IR and targets aspire to turn them into something that works
> correctly, even if it's very inefficient. Sometimes a legalization is
> unimplemented or buggy, but these problems can be patched and this has
> often happened in the past. With fractional vscale, the situation is
> quite different: nobody will ever be able to use certain scalable
> vector types on the target in question, because they can't be
> legalized even in principle.
>
> In contrast, scalable vector types that are illegal because they're
> too large (e.g. <vscale x 32 x i64>) can be legalized just fine. For
> example, you could split them across a sufficiently large (fixed)
> number of vector registers and maybe spill them to the stack for
> inserts/extracts/shuffles/etc. that cross lanes or access elements at
> data-dependent positions. Implementing this will probably not be a
> priority for any targets, but it can be implemented whenever it does
> become important to someone.
>
> I hope this lengthy explanation help you see where I'm coming from.
>
> Thanks,
> Hanna
>
> > Also, I'm acting like devil's advocate, so don't take my comments as a
> > rejection of your proposal, I'm just trying to understand where you
> > are coming from.
> >
> > cheers,
> > --renato
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200414/0a709aba/attachment-0001.html>