[llvm-dev] Adding support for vscale

Tue Oct 1 03:07:29 PDT 2019

Hi Luke,

> On 1 Oct 2019, at 09:21, Luke Kenneth Casson Leighton via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
>> First off, even if a dynamically changing vscale was truly necessary
>> for RVV or SV, this thread would be far too late to raise the question.
>> That vscale is constant -- that the number of elements in a scalable
>> vector does not change during program execution -- is baked into the
>> accepted scalable vector type proposal from top to bottom and in fact
>> was one of the conditions for its acceptance...
> 
> that should be explicitly made clear in the patches.  it sounds very
> much like it's only suitable for statically-allocated
> arrays-of-vectorisable-types:
> 
> typedef vec4 float[4]; // SEW=32,LMUL=4 probably
> static vec4 globalvec[1024]; // vscale == 1024 here

'vscale' just refers to the scaling factor that gives the maximum size of
the vector at runtime, not the number of currently active elements.

SVE will be using predication alone to deal with data that doesn't fill an
entire vector, whereas RVV and SX-Aurora want to use a separate mechanism
that fits with their hardware having a changeable active length.

The scalable type tells you the maximum number of elements that could be
operated on, and individual operations can constrain that to a smaller
number of elements. The latter is what Simon Moll's proposal addresses.

>> ... (runtime-variable type
>> sizes create many more headaches which nobody has worked out
>> how to solve to a satisfactory degree in the context of LLVM).
> 
> hmmmm.  so it looks like data-dependent fail-on-first is something
> that's going to come up later, rather than right now.

Arm's downstream compiler has been able to use the scalable type and a
constant vscale with first-faulting loads for around 4 years, so there's
no conflict here.

We will need to figure out exactly what form the first faulting intrinsics
take of course, as I think SVE's predication-only approach doesn't quite
fit with others -- maybe we'll end up with two intrinsics? Or maybe we'll
be able to synthesize a predicate from an active vlen and pattern match?
Something to discuss later I guess. (I'm not even sure AVX512 has a
first-faulting form, possibly just no-faulting and check the first predicate
element?)

>> As mentioned above, this is tangential to the focus of this thread, so if
>> you want to discuss further I'd prefer you do that in a new thread.
> 
> it's not yet clear whether vscale is intended for use in
> static-allocation involving fixed constants or whether it's intended
> for use with runtime-dependent variables inside functions.

Runtime-dependent, though you could use C-level types and intrinsics to
try a static approach.

> ok so this *might* be answering my question about vscale being
> relate-able to a function parameter (the latter of the c examples), it
> would be good to clarify.
> 
>> In RVV terms that means it's related not to VL but more to VBITS,
>> which is indeed a constant (and has been for many months).
> 
> ok so VL is definitely "assembly-level" rather than something that
> actually is exposed to the intrinsics.  that may turn out to be a
> mistake when it comes to data-dependent fail-on-first capability
> (which is present in a *DIFFERENT* form in ARM SVE, btw), but would,
> yes, need discussion separately.
> 
>> For example <vscale x 4 x i16> has four times as many elements and
>> twice as many bits as <vscale x 1 x i32>, so it captures the distinction
>> between a SEW=16,LMUL=2 vtype setting and a SEW=32,LMUL=1
>> vtype setting.
> 
> hang on - so this may seem like a silly question: is it intended that
> the *word* vscale would actually appear in LLVM-IR i.e. it is a new
> compiler "keyword"?  or did you use it here in the context of just "an
> example", where actually the idea is that actual value would be <5 x 4
> x i16> or <5 x 1 x i32>?

If you're referring to the '<vscale x 4 x i32>' syntax, that's already part
of LLVM IR now (though effectively still in 'beta'). You can see a few
examples in .ll tests now, e.g. llvm/test/Bitcode/compatibility.ll

It's also documented in the langref.

Sander's patch takes the existing 'vscale' keyword and allows it to be
used outside the type, to serve as an integer constant that represents the
same runtime value as it does in the type.

Some previous discussions proposed using an intrinsic to start with for this,
and that may still happen depending on community reaction, but the Arm
hpc compiler team felt it was important to at least start a wider discussion
on this topic before proceeding. From our experience, using an intrinsic makes
it harder to work with shufflevector or get good code generation. If someone
can spot a problem with our reasoning on that please let us know.

-Graham