> On Mon, 4 Feb 2019 at 23:04, Simon Moll <moll at cs.uni-saarland.de> wrote:
>> On NEC SX-Aurora the vector length is always interpreted in 64bit data
>> chunks. That is one example of a real architecture where the vscaled
>> interpretation of VL makes sense.
> Now this is a problem. Let's leave the details of why RISC-V V needs the
> other interpretation to Phab, but we definitely have a conflict in what
> these two architectures need. How do we reconcile them? Picking one option
> and requiring a multiplication/division of the vlen argument to get the
> other meaning is a nice canonical IR form, but it seems a bit problematic
> for codegen because that mul/div is a prime candicate for being CSE'd
> across blocks (pure calculation, repeated everywhere) and consequently
> being difficult to access for pattern matching in the backend.
> On the other hand, it's a less serious problem than was previously
> discussed re: vlen vs predication. The actual change in codegen is just
> omitting one instruction, which one can easily do that in an SSA-based MIR
> pass if necessary (instead of during ISel). Moreover, the cost of a missed
> folding opportunity is relatively minor, since it'll most likely be just a
> shift by an immediate, and it'll usually be amortized over basically the
> entire loop body in a lot of code.
It's minor for power-of-2 subvector sizes. Our ISA supports subvectors of
size 3, and division by 3 is much more complex. Also, the code we will be
generating will have a lot of subvectors of size 3 since they are used
wherever a 3d position (basically at least a few times in each shader) or
normal vector is used.

> Still, does anyone have a better idea?
> Cheers,
> Robin
