<div dir="ltr">Hi Hanna,<div><br></div><div>Thanks Hanna. I got your point.<br>You mean that If the type does not exist in the type system, we still need to legalize it.<br>I support the following four kinds of i32 scalable vector types. I also do not know how to reason about vscale x 1 x i32 under this type system.<br><br><font face="monospace">          LMUL = 1           LMUL = 2            LMUL = 4            LMUL = 8<br>int32_t | vscale x 2 x i32 | vscale x  4 x i32 | vscale x  8 x i32 | vscale x 16 x i32</font><br><br>Could we just support the types in the table on the RISC-V target? I mean do not legalize it, and just issue error messages for vscale x 1 x i32.<br><br>In my latest reply, I do not propose fractional vscale. I propose “vscale x n” be an integer. Under the assumption, I could not reason about vscale x 1 x i32. However, I could reason about vscale x 2 x i32 even when vscale = 1/2. We only care about the part “vscale x n” being integer.<br><br>The original problem is the type system proposed by Hanna under ELEN = 64 is<br><br><font face="monospace">          LMUL = 1           LMUL = 2            LMUL = 4            LMUL = 8<br>int32_t | vscale x 2 x i32 | vscale x  4 x i32 | vscale x  8 x i32 | vscale x 16 x i32</font><br><br>Under ELEN = 32 is<br><br><font face="monospace">          LMUL = 1           LMUL = 2            LMUL = 4            LMUL = 8<br>int32_t | vscale x 1 x i32 | vscale x  2 x i32 | vscale x  4 x i32 | vscale x 8 x i32</font><br><br>The problem is there are multiple kinds of type systems under RISC-V RVV implementation. They are not compatible under different ELEN configurations. AFAIK, there are no such compatible problems in GCC implementation. (In GCC, they reason about the whole “poly_int”, instead of “X”.)<br><br>If llvm.vscale(i32 ElementCount) is not the way we want to go, is there any proposal to solve the compatibility problems in your type system?</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Apr 14, 2020 at 1:04 AM Hanna Kruppe <<a href="mailto:hanna.kruppe@gmail.com">hanna.kruppe@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Tue, 7 Apr 2020 at 16:09, Renato Golin <<a href="mailto:rengolin@gmail.com" target="_blank">rengolin@gmail.com</a>> wrote:<br>

><br>

> On Tue, 7 Apr 2020 at 12:51, Hanna Kruppe <<a href="mailto:hanna.kruppe@gmail.com" target="_blank">hanna.kruppe@gmail.com</a>> wrote:<br>

> > > 1. is LMUL always a multiple of ELEN?<br>

> > This happens to be true (at least in the current spec, disregarding<br>

> > some in-progress proposals) just because both are powers of two and<br>

> > the largest possible LMUL equals the smallest possible ELEN (8), but I<br>

> > don't think there is any meaning to be found in this observation. The<br>

> > two values govern unrelated aspects of the vector unit.<br>

><br>

> Sorry, I meant multiple of basic types. But you have answered my question. :)<br>

><br>

> > > 2. Is this fixed on the hardware, depending on the actual lengths, or<br>

> > > is this dynamically set by software (on a register or status flag)?<br>

> > > 2a. If dynamic, can it change from program to program? Function to function?<br>

> > It's not clear whether by "this" you mean ELEN, LMUL, or something<br>

> > else. ELEN is fixed in hardware. LMUL is a property of each individual<br>

> > instruction.<br>

><br>

> Sorry again, "this" as in both ELEN and LMUL and their relationship. Ack.<br>

><br>

> > I don't know what "vscale wouldn't apply" is supposed to mean.<br>

><br>

> Legalisation-wise, you got right, like <n x 0.5 x i64> is invalid and<br>

> gets converted to <n x 1 x i32>, which it is.<br>

><br>

> "Wouldn't apply" as in "what would be the point of having half-scale<br>

> on a type that needs to be broken in half", and thus making it whole.<br>

> You explain better below, so ignore it for now.<br>

><br>

> > But how? If we take Kai's table as gospel and look at a VLEN = ELEN =<br>

> > 32 machine, the vector type <vscale x 2 x i32> is supposed to map to a<br>

> > single vector register, which is 32b small, and thus <vscale x 2 x<br>

> > i32> would have just one element in this context (matching the "vscale<br>

> > = 1/2" intuition). To be consistent with this, <vscale x 1 x i32><br>

> > would have be contain just *half* an element. This is not something<br>

> > any legalization strategy can achieve, because it is a fundamentally<br>

> > impossible notion. So we end up in a situation where some types are<br>

> > not just illegal and have to be legalized, but are contradictory and<br>

> > can't be legalized in any meaningful way.<br>

><br>

> Right, we have faced that problem before on non-scalable vector extensions.<br>

><br>

> For example, vectorising 3 operations in a 4-wide vector and adding an<br>

> undef in the last lane.<br>

><br>

> It didn't use to be possible to do that, many years ago, as a general<br>

> case. But if you look at register aliasing (VFP and NEON in ARMv7), we<br>

> had the idea of different number of elements on the same register,<br>

> depending on how you look.<br>

><br>

> I'm not proposing to create all combinations of half-vscale shadowing,<br>

> but perhaps adding half-length types as valid and lowering them in a<br>

> special way could work much simpler than changing the interpretation<br>

> of vscale.<br>

<br>

[re-sending because I dropped the list -- sorry for the extra copy, Renato!]<br>

<br>

I don't see how the situation you mention is comparable. Legalization<br>

for e.g. <3 x i32> was not implemented at first, but as demonstrated<br>

by the fact that it *was* implemented later, there's no conceptual<br>

problem with legalizing that kind of type. You don't even have to<br>

legalize them in vector registers, three scalar registers work fine<br>

(you can even do that on the IR level).<br>

<br>

For <vscale x 1 x i32> with a fractional value of vscale, there are<br>

several conceivable ways to "legalize" this type, but none of them<br>

work. Legalization (codegen in general) does not know if the machine<br>

code will eventually run on a chip with vector registers so small that<br>

vscale works out to 1/2, but it has to choose some legalization<br>

strategy. I can imagine several approaches to this, but since the<br>

actual value of vscale is not known at this time, it will have to map<br>

the illegal scalable vector types to the vector registers in some way,<br>

to ensure there's enough space even when vscale is very large in some<br>

executions of the program.<br>

<br>

Depending on how you do that exactly, the generated code might have<br>

different behavior when running on a vscale == 1/2 machine, e.g. you<br>

might end up with a vector register holding *one* i32 element or a<br>

vector register holding *zero* i32 elements (i.e., the sole lane of<br>

the 32-bit vector register is masked out). There might be other<br>

approaches that result in yet another behavior, such as a hardware<br>

fault, but crashes and other immediate problems aside, you're going to<br>

end up with a certain discrete number of i32 values. That's a problem.<br>

If <vscale x 1 x i32> ends up having one element, and <vscale x 2 x<br>

i32> also has one (= 2 * 0.5) element, then that's wrong: the latter<br>

type must have twice as many elements as the former (one example where<br>

this matters: split_low / split_high / concat shuffle patterns). The<br>

second option, a vector with *zero* elements, is just as wrong if not<br>

worse.<br>

<br>

It's not that a correct legalization exists but it's too annoying to<br>

implement, or that one might exist but I'm too lazy to work it out.<br>

We're also not running in a limitation or oddity of the RISC-V vector<br>

ISA in particular. It's simply that, if you set vscale == 0.5, then by<br>

the way scalable vector types work (vscale * const elements), some<br>

vector types that can be written in the IR would need to have a<br>

fractional number of elements to be consistent with the other scalable<br>

vector types. As that is not possible (not even conceptually),<br>

whatever code you emit to try to legalize that type will end up being<br>

wrong in some respect.<br>

<br>

So if we'd decide to support fractional vscale, we can't say these<br>

types are "illegal". In LLVM parlance, illegal types can be used in<br>

LLVM IR and targets aspire to turn them into something that works<br>

correctly, even if it's very inefficient. Sometimes a legalization is<br>

unimplemented or buggy, but these problems can be patched and this has<br>

often happened in the past. With fractional vscale, the situation is<br>

quite different: nobody will ever be able to use certain scalable<br>

vector types on the target in question, because they can't be<br>

legalized even in principle.<br>

<br>

In contrast, scalable vector types that are illegal because they're<br>

too large (e.g. <vscale x 32 x i64>) can be legalized just fine. For<br>

example, you could split them across a sufficiently large (fixed)<br>

number of vector registers and maybe spill them to the stack for<br>

inserts/extracts/shuffles/etc. that cross lanes or access elements at<br>

data-dependent positions. Implementing this will probably not be a<br>

priority for any targets, but it can be implemented whenever it does<br>

become important to someone.<br>

<br>

I hope this lengthy explanation help you see where I'm coming from.<br>

<br>

Thanks,<br>

Hanna<br>

<br>

> Also, I'm acting like devil's advocate, so don't take my comments as a<br>

> rejection of your proposal, I'm just trying to understand where you<br>

> are coming from.<br>

><br>

> cheers,<br>

> --renato<br>

</blockquote></div>