[llvm-dev] [RFC][SVE] Supporting Scalable Vector Architectures in LLVM IR (take 2)

Fri Jul 14 02:19:27 PDT 2017

> On 6 Jul 2017, at 23:53, Amara Emerson <amara.emerson at gmail.com> wrote:
> 
> On 6 July 2017 at 23:13, Chris Lattner <clattner at nondot.org> wrote:
>>> Yes, as an extension to VectorType they can be manipulated and passed
>>> around like normal vectors, load/stored directly, phis, put in llvm
>>> structs etc. Address computation generates expressions in terms vscale
>>> and it seems to work well.
>> 
>> Right, that works out through composition, but what does it mean?  I can't have a global variable of a scalable vector type, nor does it make sense for a scalable vector to be embeddable in an LLVM IR struct: nothing that measures the size of a struct is prepared to deal with a non-constant answer.
> Although the absolute size of the types aren't known at compile time,
> there are upper bounds which the compiler can assume and use to allow
> allocation of storage for global variables and the like. The issue
> with composite type sizes again reduce to the issue of type sizes
> being either symbolic expressions or simply unknown in some cases.

To elaborate a bit more on this, for our current compiler we have a fixed upper bound (256B/2048b) on size for scalable vector types in IR. It gets us working code, but isn't truly scalable. However, this doesn't work when calculating offsets within structs when building SelectionDAG nodes; we changed the optional "Offsets" argument to ComputeValueVTs to be a SmallVectorImpl of pairs, representing scaled and unscaled byte offsets. Scaled offsets must be multiplied by vscale to get the correct addresses. It's possible this mechanism could be used in IR as well, I'll investigate. Anyone have a strong objection to that idea up front?

As far as having scalable vectors inside IR structs, we do support that for two reasons. One, it makes it easier to write unit tests where we can return multiple scalable vectors from a function, which requires structs (afaik). Two, our C-level intrinsics support 'sizeless structs' containing scalable vector types; this is why we needed to support lowering and codegen for that.

We don't support global/static scalable vector variables in C, though -- I think that would need some ELF changes, and we're not looking into that afaik.

> 
>>>> This should probably be an intrinsic, not an llvm::Constant.  The design of llvm::Constant is already wrong: it shouldn’t have operations like divide, and it would be better to not contribute to the problem.
>>> Could you explain your position more on this? The Constant
>>> architecture has been a very natural fit for this concept from our
>>> perspective.
>> 
>> It is appealing, but it is wrong.  Constant should really only model primitive constants (ConstantInt/FP, etc) and we should have one more form for “relocatable” constants.  Instead, we have intertwined constant folding and ConstantExpr logic that doesn’t make sense.
>> 
>> A better pattern to follow are intrinsics like (e.g.) llvm.coro.size.i32(), which always returns a constant value.
> Ok, we'll investigate this issue further.

So I've looked into this, and have a question. Would we be able to add a llvm.sve.vscale.i32 (or whatever we end up naming it) to the ConstantExpr hierarchy? I didn't see that with the coroutine intrinsics, but maybe I missed something. I do see the CoroSizeInst class for matching though, that helps a bit.

To support shufflevector with scalable types, we had to relax the constraints on the mask from being constant literals to just a ConstantExpr; if we can't make it constant in some way we'd need to accept plain Values for the mask, I think.

There's also potential issues in instruction selection if they aren't constant; if a non-constantexpr value is hoisted out of a given block we might not have the right nodes to select against and end up with terrible codegen.

-Graham

>> 
>>>> Ok, that sounds complicated, but can surely be made to work.  The bigger problem is that there are various LLVM IR transformations that want to put registers into memory.  All of these will be broken with this sort of type.
>>> Could you give an example?
>> 
>> The concept of “reg2mem” is to put SSA values into allocas for passes that can’t (or don’t want to) update SSA.  Similarly, function body extraction can turn SSA values into parameters, and depending on the implementation can pack them into structs.  The coroutine logic similarly needs to store registers if they cross suspend points, there are multiple other examples.
> I think this should still work. Allocas of scalable vectors are supported,
> and it's only later at codegen that the unknown sizes result in more
> work being needed to compute stack offsets correctly. The caveat being
> that a direct call to something like getTypeStoreSize() will need to
> be aware of expressions/sizeless-types. If however these passes are
> exclusively using allocas to put registers into memory, or using
> structs with extractvalue etc, then they shouldn't need to care and
> codegen deals with the low level details.