[llvm-dev] Questions about vscale

Mon Apr 13 10:03:39 PDT 2020

On Tue, 7 Apr 2020 at 16:09, Renato Golin <rengolin at gmail.com> wrote:
>
> On Tue, 7 Apr 2020 at 12:51, Hanna Kruppe <hanna.kruppe at gmail.com> wrote:
> > > 1. is LMUL always a multiple of ELEN?
> > This happens to be true (at least in the current spec, disregarding
> > some in-progress proposals) just because both are powers of two and
> > the largest possible LMUL equals the smallest possible ELEN (8), but I
> > don't think there is any meaning to be found in this observation. The
> > two values govern unrelated aspects of the vector unit.
>
> Sorry, I meant multiple of basic types. But you have answered my question. :)
>
> > > 2. Is this fixed on the hardware, depending on the actual lengths, or
> > > is this dynamically set by software (on a register or status flag)?
> > > 2a. If dynamic, can it change from program to program? Function to function?
> > It's not clear whether by "this" you mean ELEN, LMUL, or something
> > else. ELEN is fixed in hardware. LMUL is a property of each individual
> > instruction.
>
> Sorry again, "this" as in both ELEN and LMUL and their relationship. Ack.
>
> > I don't know what "vscale wouldn't apply" is supposed to mean.
>
> Legalisation-wise, you got right, like <n x 0.5 x i64> is invalid and
> gets converted to <n x 1 x i32>, which it is.
>
> "Wouldn't apply" as in "what would be the point of having half-scale
> on a type that needs to be broken in half", and thus making it whole.
> You explain better below, so ignore it for now.
>
> > But how? If we take Kai's table as gospel and look at a VLEN = ELEN =
> > 32 machine, the vector type <vscale x 2 x i32> is supposed to map to a
> > single vector register, which is 32b small, and thus <vscale x 2 x
> > i32> would have just one element in this context (matching the "vscale
> > = 1/2" intuition). To be consistent with this, <vscale x 1 x i32>
> > would have be contain just *half* an element. This is not something
> > any legalization strategy can achieve, because it is a fundamentally
> > impossible notion. So we end up in a situation where some types are
> > not just illegal and have to be legalized, but are contradictory and
> > can't be legalized in any meaningful way.
>
> Right, we have faced that problem before on non-scalable vector extensions.
>
> For example, vectorising 3 operations in a 4-wide vector and adding an
> undef in the last lane.
>
> It didn't use to be possible to do that, many years ago, as a general
> case. But if you look at register aliasing (VFP and NEON in ARMv7), we
> had the idea of different number of elements on the same register,
> depending on how you look.
>
> I'm not proposing to create all combinations of half-vscale shadowing,
> but perhaps adding half-length types as valid and lowering them in a
> special way could work much simpler than changing the interpretation
> of vscale.

[re-sending because I dropped the list -- sorry for the extra copy, Renato!]

I don't see how the situation you mention is comparable. Legalization
for e.g. <3 x i32> was not implemented at first, but as demonstrated
by the fact that it *was* implemented later, there's no conceptual
problem with legalizing that kind of type. You don't even have to
legalize them in vector registers, three scalar registers work fine
(you can even do that on the IR level).

For <vscale x 1 x i32> with a fractional value of vscale, there are
several conceivable ways to "legalize" this type, but none of them
work. Legalization (codegen in general) does not know if the machine
code will eventually run on a chip with vector registers so small that
vscale works out to 1/2, but it has to choose some legalization
strategy. I can imagine several approaches to this, but since the
actual value of vscale is not known at this time, it will have to map
the illegal scalable vector types to the vector registers in some way,
to ensure there's enough space even when vscale is very large in some
executions of the program.

Depending on how you do that exactly, the generated code might have
different behavior when running on a vscale == 1/2 machine, e.g. you
might end up with a vector register holding *one* i32 element or a
vector register holding *zero* i32 elements (i.e., the sole lane of
the 32-bit vector register is masked out). There might be other
approaches that result in yet another behavior, such as a hardware
fault, but crashes and other immediate problems aside, you're going to
end up with a certain discrete number of i32 values. That's a problem.
If <vscale x 1 x i32> ends up having one element, and <vscale x 2 x
i32> also has one (= 2 * 0.5) element, then that's wrong: the latter
type must have twice as many elements as the former (one example where
this matters: split_low / split_high / concat shuffle patterns). The
second option, a vector with *zero* elements, is just as wrong if not
worse.

It's not that a correct legalization exists but it's too annoying to
implement, or that one might exist but I'm too lazy to work it out.
We're also not running in a limitation or oddity of the RISC-V vector
ISA in particular. It's simply that, if you set vscale == 0.5, then by
the way scalable vector types work (vscale * const elements), some
vector types that can be written in the IR would need to have a
fractional number of elements to be consistent with the other scalable
vector types. As that is not possible (not even conceptually),
whatever code you emit to try to legalize that type will end up being
wrong in some respect.

So if we'd decide to support fractional vscale, we can't say these
types are "illegal". In LLVM parlance, illegal types can be used in
LLVM IR and targets aspire to turn them into something that works
correctly, even if it's very inefficient. Sometimes a legalization is
unimplemented or buggy, but these problems can be patched and this has
often happened in the past. With fractional vscale, the situation is
quite different: nobody will ever be able to use certain scalable
vector types on the target in question, because they can't be
legalized even in principle.

In contrast, scalable vector types that are illegal because they're
too large (e.g. <vscale x 32 x i64>) can be legalized just fine. For
example, you could split them across a sufficiently large (fixed)
number of vector registers and maybe spill them to the stack for
inserts/extracts/shuffles/etc. that cross lanes or access elements at
data-dependent positions. Implementing this will probably not be a
priority for any targets, but it can be implemented whenever it does
become important to someone.

I hope this lengthy explanation help you see where I'm coming from.

Thanks,
Hanna

> Also, I'm acting like devil's advocate, so don't take my comments as a
> rejection of your proposal, I'm just trying to understand where you
> are coming from.
>
> cheers,
> --renato