[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

Wed Aug 1 04:15:56 PDT 2018

On Tue, 31 Jul 2018 at 23:46, Hal Finkel via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> In some sense, if you make vscale dynamic,
> you've introduced dependent types into LLVM's type system, but you've
> done it in an implicit manner. It's not clear to me that works. If we
> need dependent types, then an explicit dependence seems better. (e.g.,
> <scalable <n> x %vscale_var x <type>>)

That's a shift from the current proposal and I think we can think
about it after the current changes. For now, both SVE and RISC-V are
proposing function boundaries for changes in vscale.

> 2. How would the function-call boundary work? Does the function itself
> have intrinsics that change the vscale?

Functions may not know what their vscale is until they're actually
executed. They could even have different vscales for different call
sites.

AFAIK, it's not up to the compiled program (ie via a function
attribute or an inline asm call) to change the vscale, but the
kernel/hardware can impose dynamic restrictions on the process. But,
for now, only at (binary object) function boundaries.

I don't know how that works at the kernel level (how to detect those
boundaries? instrument every branch?) but this is what I understood
from the current discussion.

> If so, then it's not clear that
> the function-call boundary makes sense unless you prevent inlining. If
> you prevent inlining, when does that decision get made? Will the
> vectorizer need to outline loops? If so, outlining can have a real cost
> that's difficult to model. How do return types work?

The dynamic nature is not part of the program, so inlining can happen
as always. Given that the vectors are agnostic of size and work
regardless of what the kernel provides (within safety boundaries), the
code generation shouldn't change too much.

We may have to create artefacts to restrict the maximum vscale (for
safety), but others are better equipped to answer that question.

>  1. I can definitely see the use cases for changing vscale dynamically,
> and so I do suspect that we'll want that support.

At a process/function level, yes. Within the same self-contained
sub-graph, I don't know.

>  2. LLVM does not have loops as first-class constructs. We only have SSA
> (and, thus, dominance), and when specifying restrictions on placement of
> things in function bodies, we need to do so in terms of these constructs
> that we have (which don't include loops).

That's why I was trying to define the "self-contained sub-graph" above
(there must be a better term for that). It has to do with data
dependencies (scalar|memory -> vector -> scalar|memory), ie. make sure
side-effects don't leak out.

A loop iteration is usually such a block, but not all are and not all
such blocks are loops.

Changing vscale inside a function, but outside of those blocks would
be "fine", as long as we made sure code movement respects those
boundaries and that context would be restored correctly on exceptions.
But that's not part of the current proposal.

Chaning vscale inside one of those blocks would be madness. :)

cheers,
--renato