[llvm-dev] Adding support for vscale

Sat Oct 5 04:36:29 PDT 2019

On Wednesday, October 2, 2019, Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:

>
>
> On Wednesday, October 2, 2019, Robin Kruppe <robin.kruppe at gmail.com>
> wrote:
>
>>
>> Granted, I don't think this or other examples will normally occur in LLVM
>> IR generated by a loop vectorizer, so vscale will probably not occur very
>> frequently in RVV.
>>
>
> Interesting. It is sort-of what I had a hunch would be the case.
>
>
Ok so taking the RISCV developers off cc, because it looks like neither SV
nor RVV would use vscale, as we basically identified, eventually, that it
is a way to express the "architectural SIMD width".

The rest of this is therefore nothing to do with vector engines, and is
purely some constructive input for future consideration.

Let us take a scenario where data is short vectors, well below vscale. That
there is also some inter-element dependence (cross product or other) which
makes laying multiple short vectors into a single vscale long SIMD awkward.

Under such circumstances having a fixed vscale is extremely wasteful,
particularly if there is an out of order engine which could use mixed
scalar or MMX/SSE with AVX512 for example.

Thus for the longer operations the idea is to throw those at AVX512 and the
shorter ones at 64 bit MMX/SSE.

The point is: *both could benefit from vscale* excrpt unfortunately, there
is only *one* vscale and it can therefore only be applied to *one* of the
SIMD ALUs.

This tends to suggest that either vscale should be a variable (and
applicable on a per group basis, separated by LD/STs)

OR

That there should be more than one vscale.

i.e that vscale should, instead.of being a fixed global type, should
instead be morphed to be %vscaleN similar to %regN, conveying the context
of its intended scope and use.

Thus, certain groups of operations intended to be farmed to a SPECIFIC SIMD
suite (AVX512) may be *specifically* separated from those intended to be
targetted at another suite (MMX/SSE).

Of course, on architectures which have no such distinction, a simple pass
would merge them all into one global vscale.

A thought for consideration.

L.

-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191005/1da6b724/attachment.html>