[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

Thu Oct 11 07:14:28 PDT 2018

Hi,

Sorry for the delay, but I now have an initial implementation of size queries
for scalable types on phabricator:

https://reviews.llvm.org/D53137 and https://reviews.llvm.org/D53138

This isn't complete (I haven't used the DataLayout version outside of the tests),
but I'd like to get some feedback before making further changes.

Some notes/questions:

  1. I've changed Type::getPrimitiveSizeInBits to return 0 for vector types
     and changed all uses of it with vector types to use
     Type::getScalableSizeInBits instead, following the design in the RFC.

     While this shows where getPrimitiveSizeInBits is used with vector types,
     I think it would be better for now to change it back to avoid breaking
     existing targets and put an assert in it to ensure that it's only used
     on non-scalable vectors. We can revisit the decision later once
     scalable vector support is more mature. Thoughts?

  2. There are two implementations of ScalableSize; one in Type.h, and one
     in DataLayout.h. I'd prefer to only have one, but the former reports
     sizes as 32bits while the latter uses 64bits.

     I think changing all size queries to use 64bits is the best way to
     resolve it -- are there any significant problems with that approach
     aside from lots of code churn?

     It would also be possible to use templates and typedefs, but I figure
     unifying size reporting would be better.

  3. I have only implemented 'strict' comparisons for now, which leads to
     some possibly-surprising results; {X, 0} compared with {0, X} will
     return false for both '==' and '<' comparisons, but true for '<='.

     I think that supporting 'maybe' results from overloaded operators
     would be a bad idea, so if/when we find cases where they are needed
     then I think new functions should be written to cover those cases
     and only used where it matters. For simple things like stopping
     casts between scalable and non-scalable vectors the strict
     comparisons should suffice.

  4. Alignment for structs containing scalable types is tricky. For now,
     I've added an assert to force all structs containing scalable vectors
     to be packed.

     It won't be possible to calculate correct offsets at compile time if
     the minimum size of a struct member isn't a multiple of the required
     alignment for the subsequent element(s). 

     Instead, a runtime calculation will be required. This could arise in
     SVE if a predicate register (minimum 2 bytes) were used followed by
     an aligned data vector -- it could be aligned, but it could also
     require adding up to 14 bytes of padding to reach minimum alignment
     for data vectors.

     The proposed ACLE does allow creating sizeless structs with both
     predicate and data registers so we can't forbid such structs, but it
     makes no guarantees about alignment -- it's implementation defined.

     Do any of the other architectures with scalable vectors have any
     particular requirements for this?

  5. The comparison operators contain all cases within them. Would it be
     preferable to just keep the initial case (scalable terms equal and
     likely zero) in the header for inlining, and move all other cases
     into another function elsewhere to reduce code bloat a bit?

  6. Any suggestions for better names?

  7. Would it be beneficial to put the RFC in a phabricator review to make
     it easier to see changes?

  8. I will be at the devmeeting next week, so if anyone wants to chat
     about scalable vector support that would be very welcome.

-Graham

> On 1 Aug 2018, at 20:43, Hal Finkel <hfinkel at anl.gov> wrote:
> 
> 
> On 08/01/2018 02:00 PM, Graham Hunter wrote:
>> Hi Hal,
>> 
>>> On 30 Jul 2018, at 20:10, Hal Finkel <hfinkel at anl.gov> wrote:
>>> 
>>> 
>>> On 07/30/2018 05:34 AM, Chandler Carruth wrote:
>>>> I strongly suspect that there remains widespread concern with the direction of this, I know I have them.
>>>> 
>>>> I don't think that many of the people who have that concern have had time to come back to this RFC and make progress on it, likely because of other commitments or simply the amount of churn around SVE related patches and such. That is at least why I haven't had time to return to this RFC and try to write more detailed feedback.
>>>> 
>>>> Certainly, I would want to see pretty clear and considered support for this change to the IR type system from Hal, Chris, Eric and/or other long time maintainers of core LLVM IR components before it moves forward, and I don't see that in this thread.
>>> At a high level, I'm happy with this approach. I think it will be important for LLVM to support runtime-determined vector lengths - I see the customizability and power-efficiency constraints that motivate these designs continuing to increase in importance. I'm still undecided on whether this makes vector code nicer even for fixed-vector-length architectures, but some of the design decisions that it forces, such as having explicit intrinsics for reductions and other horizontal operations, seem like the right direction regardless.
>> Thanks, that's good to hear.
>> 
>>> 1.
>>>> This is a proposal for how to deal with querying the size of scalable types for
>>>>> analysis of IR. While it has not been implemented in full,
>>> Is this still true? The details here need to all work out, obviously, and we should make sure that any issues are identified.
>> Yes. I had hoped to get some more comments on the basic approach before progressing with the implementation, but if it makes more sense to have the implementation available to discuss then I'll start creating patches.
> 
> At least on this point, I think that we'll want to have the
> implementation to help make sure there aren't important details we're
> overlooking.
> 
>> 
>>> 2. I know that there has been some discussion around support for changing the vector length during program execution (e.g., to account for some (proposed?) RISC-V feature), perhaps even during the execution of a single function. I'm very concerned about this idea because it is not at all clear to me how to limit information transfer contaminated with the vector size from propagating between different regions. As a result, I'm concerned about trying to add this on later, and so if this is part of the plan, I think that we need to think through the details up front because it could have a major impact on the design.
>> I think Robin's email yesterday covered it fairly nicely; this RFC proposes that the hardware length of vectors will be consistent throughout an entire function, so we don't need to limit information inside a function, just between them. For SVE, h/w vector length will likely be consistent across the whole program as well (assuming the programmer doesn't make a prctl call to the kernel to change it) so we could drop that limit too, but I thought it best to come up with a unified approach that would work for both architectures. The 'inherits_vscale' attribute would allow us to continue optimizing across functions for SVE where desired.
> 
> I think that this will likely work, although I think we want to invert
> the sense of the attribute. vscale should be inherited by default, and
> some attribute can say that this isn't so. That same attribute, I
> imagine, will also forbid scalable vector function arguments and return
> values on those functions. If we don't have inherited vscale as the
> default, we place an implicit contract on any IR transformation hat
> performs outlining that it needs to scan for certain kinds of vector
> operations and add the special attribute, or just always add this
> special attribute, and that just becomes another special case, which
> will only actually manifest on certain platforms, that it's best to avoid.
> 
>> 
>> Modelling the dynamic vector length for RVV is something for Robin (or others) to tackle later, but can be though of (at a high level) as an implicit predicate on all operations.
> 
> My point is that, while there may be some sense in which the details can
> be worked out later, we need to have a good-enough understanding of how
> this will work now in order to make sure that we're not making design
> decisions now that make handling the dynamic vscale in a reasonable way
> later more difficult.
> 
> Thanks again,
> Hal
> 
>> 
>> -Graham
>> 
>>> Thanks again,
>>> Hal
>>> 
>>> -- 
>>> Hal Finkel
>>> Lead, Compiler Technology and Programming Languages
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>>> 
> 
> -- 
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory