[llvm-dev] Adding support for vscale
Luke Kenneth Casson Leighton via llvm-dev
llvm-dev at lists.llvm.org
Tue Oct 1 01:21:24 PDT 2019
On Tue, Oct 1, 2019 at 8:08 AM Robin Kruppe <robin.kruppe at gmail.com> wrote:
>
> Hello Jacob and Luke,
>
> First off, even if a dynamically changing vscale was truly necessary
> for RVV or SV, this thread would be far too late to raise the question.
> That vscale is constant -- that the number of elements in a scalable
> vector does not change during program execution -- is baked into the
> accepted scalable vector type proposal from top to bottom and in fact
> was one of the conditions for its acceptance...
that should be explicitly made clear in the patches. it sounds very
much like it's only suitable for statically-allocated
arrays-of-vectorisable-types:
typedef vec4 float[4]; // SEW=32,LMUL=4 probably
static vec4 globalvec[1024]; // vscale == 1024 here
or, would it be intended for use inside functions - again statically-allocated?
int somefn(void) {
static vec4 localvec[1024]; // vscale == 1024 here
}
*or*, would it be intended to be used like this?
int somefn(num_of_vec4s) {
static vec4 localvec[num_of_vec4s]; // vscale == dynamic, here
}
clarifying this in the documentation strings on vscale, perhaps even
providing c-style examples, would be extremely useful, and avoid
misunderstandings.
>... (runtime-variable type
> sizes create many more headaches which nobody has worked out
>how to solve to a satisfactory degree in the context of LLVM).
hmmmm. so it looks like data-dependent fail-on-first is something
that's going to come up later, rather than right now.
> *This* thread is just about whether vscale should be exposed to programs
> in the form of a Constant or as an intrinsic which always returns the same
> value during one program execution.
>
> Luckily, this is not a problem for RVV. I do not know anything about this
> "SV" extension you are working on
SV has been designed specifically to help with the creation of
*Hybrid* CPU / VPU / GPUs. it's very similar to RVV except that there
are no new instructions added.
a typical GPU would be happy to have 128-bit-wide SIMD or VLIW-style
instructions, on the basis that (A) the shader programs are usually no
greater than 1K in size and (B) those 128-bit-wide instructions have
an extremely high bang-per-buck ratio, of 32x FP32 operations issued
at once.
in a *hybrid* CPU - VPU - GPU context even a 1k shader program hits a
significant portion of the 1st level cache which is *not* separate
from a *GPU*'s 1st level cache because the CPU *is* the GPU.
consequently, SV has been specifically designed to "compactify"
instruction effectiveness by "prefixing" even RVC 16-bit opcodes with
vectorisation "tags".
this has the side-effect of reducing executable size by over 10% in
many cases when compared to RVV.
> so I cannot comment on that, but I'll sketch the reasons for why it's not
> an issue with RVV and maybe that helps you with SV too.
looks like it does: Jacob explains (in another reply) that MVL is
exactly the same concept, except that in RVV it is hard-coded (baked)
into the hardware, where in SV it is explicitly set as a CSR, and i
explained in the previous reply that in RVV the VL CSR is requested
(and the hardware chooses a value), whereas in SV, the VL CSR *must*
be set to exactly what is requested [within the bounds of MVL, sorry,
left that out earlier].
> As mentioned above, this is tangential to the focus of this thread, so if
> you want to discuss further I'd prefer you do that in a new thread.
it's not yet clear whether vscale is intended for use in
static-allocation involving fixed constants or whether it's intended
for use with runtime-dependent variables inside functions.
with that not being clear, my questions are not tangential to the
focus of the thread.
however yes i would agree that data-dependent fail-on-first is
definitely not the focus of this thread, and would need to be
discussed later.
we are a very small team at the moment, we may end up missing valuable
discussions: how can it be ensured that we are included in future
discussions?
> [...]
> You may be aware of Simon Moll's vector predication (previously:
> explicit vector length) proposal which does just that.
ah yehyehyeh. i remember.
> In contrast, the vscale concept is more about how many elements a
> vector register contains, regardless of whether some operations process
> only a subset of them.
ok so this *might* be answering my question about vscale being
relate-able to a function parameter (the latter of the c examples), it
would be good to clarify.
> In RVV terms that means it's related not to VL but more to VBITS,
> which is indeed a constant (and has been for many months).
ok so VL is definitely "assembly-level" rather than something that
actually is exposed to the intrinsics. that may turn out to be a
mistake when it comes to data-dependent fail-on-first capability
(which is present in a *DIFFERENT* form in ARM SVE, btw), but would,
yes, need discussion separately.
> For example <vscale x 4 x i16> has four times as many elements and
> twice as many bits as <vscale x 1 x i32>, so it captures the distinction
> between a SEW=16,LMUL=2 vtype setting and a SEW=32,LMUL=1
> vtype setting.
hang on - so this may seem like a silly question: is it intended that
the *word* vscale would actually appear in LLVM-IR i.e. it is a new
compiler "keyword"? or did you use it here in the context of just "an
example", where actually the idea is that actual value would be <5 x 4
x i16> or <5 x 1 x i32>?
let me re-read the summary:
"This patch adds vscale as a symbolic constant to the IR, similar to
undef and zeroinitializer, so that it can be used in constant
expressions."
it's a keyword, isn't it?
so, that "vscale" keyword would be substituted at runtime by either a
constant (1024) *or* a runtime-calculated variable or function
parameter (num_of_vec4s), is that correct?
apologies for asking: these are precisely the kinds of
from-zero-prior-knowledge questions that help with any review process
to clarify things for other users/devs.
l.
More information about the llvm-dev
mailing list