[llvm-dev] [RFC][SVE] Supporting Scalable Vector Architectures in LLVM IR (take 2)

Thu Jul 6 13:02:27 PDT 2017

On Jun 1, 2017, at 7:22 AM, Graham Hunter via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> Hi,
> 
> Here's the updated RFC for representing scalable vector types and associated constants in IR. I added a section to address questions that came up on the recent patch review.

Thanks for sending this out Graham.  Here are some comments:

> =====
> Types
> =====
> 
> To represent a vector of unknown length a boolean `Scalable` property has been
> added to the `VectorType` class. Most code that deals with vectors doesn't need
> to know the exact length, but does need to know relative lengths -- e.g. get
> a vector with the same number of elements but a different element type, or with
> half or double the number of elements.
> 
> In order to allow code to transparently support scalable vectors, we introduce
> an `ElementCount` class with two members:
> 
> - `unsigned Min`: the minimum number of elements.
> - `bool Scalable`: is the element count an unknown multiple of `Min`?
> For non-scalable vectors (``Scalable=false``) the scale is considered to be
> equal to one and thus `Min` represents the exact number of elements in the
> vector.
> 
> The intent for code working with vectors is to use convenience methods and avoid
> directly dealing with the number of elements. If needed, calling
> `getElementCount` on a vector type instead of `getVectorNumElements` can be used
> to obtain the (potentially scalable) number of elements. Overloaded division and
> multiplication operators allow an ElementCount instance to be used in much the
> same manner as an integer for most cases.
> 
> This mixture of static and runtime quantities allow us to reason about the
> relationship between different scalable vector types without knowing their
> exact length.

This is a clever approach to unifying the two concepts, and I think that the approach is basically reasonable.  The primary problem that this will introduce is:

1) Almost anything touching (e.g. transforming) vector operations will have to be aware of this concept.  Given a first class implementation of SVE, I don’t see how that’s avoidable though, and your extension of VectorType is sensible.

2) This means that VectorType is sometimes fixed size, and sometime unknowable.  I don’t think we have an existing analog for that in the type system.  

Is this type a first class type?  Can you PHI them, can you load/store them, can you pass them as function arguments without limitations?  If not, that is a serious problem.  How does struct layout with a scalable vector in it work?  What does an alloca of one of them look like?  What does a spill look like in codegen?

> IR Textual Form
> ---------------
> 
> The textual form for a scalable vector is:
> 
> ``<[n x ]<m> x <type>>``
> 
> where `type` is the scalar type of each element, `m` is the minimum number of
> elements, and the string literal `n x` indicates that the total number of
> elements is an unknown multiple of `m`; `n` is just an arbitrary choice for
> indicating that the vector is scalable, and could be substituted by another.
> For fixed-length vectors, the `n x` is omitted, so there is no change in the
> format for existing vectors.
> 
> Scalable vectors with the same `Min` value have the same number of elements, and
> the same number of bytes if `Min * sizeof(type)` is the same:
> 
> ``<n x 4 x i32>`` and ``<n x 4 x i8>`` have the same number of elements.
> 
> ``<n x 4 x i32>`` and ``<n x 8 x i16>`` have the same number of bytes.

It’s a trivial syntactic issue, but I’d suggest something more along the lines of:

<scalable 4 x i32> 

or something like that, just to make it easier to read.

> Alternatives Considered
> -----------------------
> 
> We had two alternatives in mind -- a dedicated target specific type, and a
> subclass inheriting from VectorType.

I think that a target-specific type (e.g. like we have X86_mmx) is the only reasonable alternative.  A subclass of VectorType is just another implementation approach of your design above.  This is assuming that scalable vectors are really first class types.

The pros and cons of a separate type is that it avoids you having to touch everything that touches VectorTypes, and if it turns out that the code that needs to handle normal SIMD and scalable SIMD vectors is different, then it is a win to split them into two types.  If, on the other hand, most code would treat the two types similarly, then it is better to just have one type.

The major concern I have here is that I’m not sure how scalable vectors can be considered to be first class types, given that we don’t know their size.  If they can’t be put in an LLVM struct (for example), then this would pose a significant problem with your current approach.  It would be a huge problem if VectorType could be in structs in some cases, but not others.

> Although our current solution will need to change some of the code that creates
> new VectorTypes, much of that code doesn't need to care about whether the types
> are scalable or not -- they can use preexisting methods like
> `getHalfElementsVectorType`. If the code is a little more complex,
> `ElementCount` structs can be used instead of an `unsigned` value to represent
> the number of elements.

Agreed, that seems fine to me if true.

> =================
> Runtime Constants
> =================
> 
> With a scalable vector type defined, we now need a way to generate addresses for
> consecutive vector values in memory and to be able to create basic constant
> vector values.
> 
> For address generation, the `vscale` constant is added to represent the runtime
> value of `n` in `<n x m x type>`.

This should probably be an intrinsic, not an llvm::Constant.  The design of llvm::Constant is already wrong: it shouldn’t have operations like divide, and it would be better to not contribute to the problem. 

> Multiplying `vscale` by `m` and the number of
> bytes in `type` gives the total length of a scalable vector, and the backend
> can pattern match to the various load and store instructions in SVE that
> automatically scale with vector length.

It is fine for the intrinsic to turn into a target specific ISD node in selection dag to allow your pattern matching.

> =====================
> Questions and Answers
> =====================
> 
> Can the vector length change at runtime?
> ----------------------------------------
> 
> It is possible to change vector length at runtime, but there is no model defined
> for how to resolve all the issues that arise when doing so. From the compiler's
> point of view, it is expected that vector length can be treated as constant but
> unknown.

The way I would explain it is that it is a (load time) constant.  There is no practical way for software to handle this case, so even if hardware can do it, it is a non goal to support it.

> How do we spill/fill scalable registers on the stack?
> -----------------------------------------------------
> 
> SVE registers have a (partially) unknown size at build time and their associated
> fill/spill instructions require an offset that is implicitly scaled by the
> vector length instead of bytes or element size. To accommodate this we
> created the concept of Stack Regions that are areas on the stack associated
> with specific data types or register classes.

Ok, that sounds complicated, but can surely be made to work.  The bigger problem is that there are various LLVM IR transformations that want to put registers into memory.  All of these will be broken with this sort of type.

-Chris