[llvm-dev] [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

Thu Oct 25 06:09:22 PDT 2018

Hi,

Following various discussions at the recent devmeeting, I've posted an RFC for
scalable vector IR type alone on phabricator: https://reviews.llvm.org/D53695

There's a couple of changes, and I posted that as a separate revision on top
of the previous text so changes are visible.

The main differences are:

  - Size comparisons between unscaled and scaled vector types are considered
    invalid for now, and will assert.

  - Scalable vector types cannot be members of StructTypes or ArrayTypes. If
    these are needed at the C level (e.g. the SVE ACLE C intrinsics), then
    clang must perform lowering to a pointer + vscale-based arithmetic instead
    of creating aggregates in IR.

I will update the IR type patch and size query patch soon.

-Graham

> On 11 Oct 2018, at 15:14, Graham Hunter via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> Hi,
> 
> Sorry for the delay, but I now have an initial implementation of size queries
> for scalable types on phabricator:
> 
> https://reviews.llvm.org/D53137 and https://reviews.llvm.org/D53138
> 
> This isn't complete (I haven't used the DataLayout version outside of the tests),
> but I'd like to get some feedback before making further changes.
> 
> Some notes/questions:
> 
>  1. I've changed Type::getPrimitiveSizeInBits to return 0 for vector types
>     and changed all uses of it with vector types to use
>     Type::getScalableSizeInBits instead, following the design in the RFC.
> 
>     While this shows where getPrimitiveSizeInBits is used with vector types,
>     I think it would be better for now to change it back to avoid breaking
>     existing targets and put an assert in it to ensure that it's only used
>     on non-scalable vectors. We can revisit the decision later once
>     scalable vector support is more mature. Thoughts?
> 
>  2. There are two implementations of ScalableSize; one in Type.h, and one
>     in DataLayout.h. I'd prefer to only have one, but the former reports
>     sizes as 32bits while the latter uses 64bits.
> 
>     I think changing all size queries to use 64bits is the best way to
>     resolve it -- are there any significant problems with that approach
>     aside from lots of code churn?
> 
>     It would also be possible to use templates and typedefs, but I figure
>     unifying size reporting would be better.
> 
>  3. I have only implemented 'strict' comparisons for now, which leads to
>     some possibly-surprising results; {X, 0} compared with {0, X} will
>     return false for both '==' and '<' comparisons, but true for '<='.
> 
>     I think that supporting 'maybe' results from overloaded operators
>     would be a bad idea, so if/when we find cases where they are needed
>     then I think new functions should be written to cover those cases
>     and only used where it matters. For simple things like stopping
>     casts between scalable and non-scalable vectors the strict
>     comparisons should suffice.
> 
>  4. Alignment for structs containing scalable types is tricky. For now,
>     I've added an assert to force all structs containing scalable vectors
>     to be packed.
> 
>     It won't be possible to calculate correct offsets at compile time if
>     the minimum size of a struct member isn't a multiple of the required
>     alignment for the subsequent element(s). 
> 
>     Instead, a runtime calculation will be required. This could arise in
>     SVE if a predicate register (minimum 2 bytes) were used followed by
>     an aligned data vector -- it could be aligned, but it could also
>     require adding up to 14 bytes of padding to reach minimum alignment
>     for data vectors.
> 
>     The proposed ACLE does allow creating sizeless structs with both
>     predicate and data registers so we can't forbid such structs, but it
>     makes no guarantees about alignment -- it's implementation defined.
> 
>     Do any of the other architectures with scalable vectors have any
>     particular requirements for this?
> 
>  5. The comparison operators contain all cases within them. Would it be
>     preferable to just keep the initial case (scalable terms equal and
>     likely zero) in the header for inlining, and move all other cases
>     into another function elsewhere to reduce code bloat a bit?
> 
>  6. Any suggestions for better names?
> 
>  7. Would it be beneficial to put the RFC in a phabricator review to make
>     it easier to see changes?
> 
>  8. I will be at the devmeeting next week, so if anyone wants to chat
>     about scalable vector support that would be very welcome.
> 
> 
> -Graham
> 
>> On 1 Aug 2018, at 20:43, Hal Finkel <hfinkel at anl.gov> wrote:
>> 
>> 
>> On 08/01/2018 02:00 PM, Graham Hunter wrote:
>>> Hi Hal,
>>> 
>>>> On 30 Jul 2018, at 20:10, Hal Finkel <hfinkel at anl.gov> wrote:
>>>> 
>>>> 
>>>> On 07/30/2018 05:34 AM, Chandler Carruth wrote:
>>>>> I strongly suspect that there remains widespread concern with the direction of this, I know I have them.
>>>>> 
>>>>> I don't think that many of the people who have that concern have had time to come back to this RFC and make progress on it, likely because of other commitments or simply the amount of churn around SVE related patches and such. That is at least why I haven't had time to return to this RFC and try to write more detailed feedback.
>>>>> 
>>>>> Certainly, I would want to see pretty clear and considered support for this change to the IR type system from Hal, Chris, Eric and/or other long time maintainers of core LLVM IR components before it moves forward, and I don't see that in this thread.
>>>> At a high level, I'm happy with this approach. I think it will be important for LLVM to support runtime-determined vector lengths - I see the customizability and power-efficiency constraints that motivate these designs continuing to increase in importance. I'm still undecided on whether this makes vector code nicer even for fixed-vector-length architectures, but some of the design decisions that it forces, such as having explicit intrinsics for reductions and other horizontal operations, seem like the right direction regardless.
>>> Thanks, that's good to hear.
>>> 
>>>> 1.
>>>>> This is a proposal for how to deal with querying the size of scalable types for
>>>>>> analysis of IR. While it has not been implemented in full,
>>>> Is this still true? The details here need to all work out, obviously, and we should make sure that any issues are identified.
>>> Yes. I had hoped to get some more comments on the basic approach before progressing with the implementation, but if it makes more sense to have the implementation available to discuss then I'll start creating patches.
>> 
>> At least on this point, I think that we'll want to have the
>> implementation to help make sure there aren't important details we're
>> overlooking.
>> 
>>> 
>>>> 2. I know that there has been some discussion around support for changing the vector length during program execution (e.g., to account for some (proposed?) RISC-V feature), perhaps even during the execution of a single function. I'm very concerned about this idea because it is not at all clear to me how to limit information transfer contaminated with the vector size from propagating between different regions. As a result, I'm concerned about trying to add this on later, and so if this is part of the plan, I think that we need to think through the details up front because it could have a major impact on the design.
>>> I think Robin's email yesterday covered it fairly nicely; this RFC proposes that the hardware length of vectors will be consistent throughout an entire function, so we don't need to limit information inside a function, just between them. For SVE, h/w vector length will likely be consistent across the whole program as well (assuming the programmer doesn't make a prctl call to the kernel to change it) so we could drop that limit too, but I thought it best to come up with a unified approach that would work for both architectures. The 'inherits_vscale' attribute would allow us to continue optimizing across functions for SVE where desired.
>> 
>> I think that this will likely work, although I think we want to invert
>> the sense of the attribute. vscale should be inherited by default, and
>> some attribute can say that this isn't so. That same attribute, I
>> imagine, will also forbid scalable vector function arguments and return
>> values on those functions. If we don't have inherited vscale as the
>> default, we place an implicit contract on any IR transformation hat
>> performs outlining that it needs to scan for certain kinds of vector
>> operations and add the special attribute, or just always add this
>> special attribute, and that just becomes another special case, which
>> will only actually manifest on certain platforms, that it's best to avoid.
>> 
>>> 
>>> Modelling the dynamic vector length for RVV is something for Robin (or others) to tackle later, but can be though of (at a high level) as an implicit predicate on all operations.
>> 
>> My point is that, while there may be some sense in which the details can
>> be worked out later, we need to have a good-enough understanding of how
>> this will work now in order to make sure that we're not making design
>> decisions now that make handling the dynamic vscale in a reasonable way
>> later more difficult.
>> 
>> Thanks again,
>> Hal
>> 
>>> 
>>> -Graham
>>> 
>>>> Thanks again,
>>>> Hal
>>>> 
>>>> -- 
>>>> Hal Finkel
>>>> Lead, Compiler Technology and Programming Languages
>>>> Leadership Computing Facility
>>>> Argonne National Laboratory
>>>> 
>> 
>> -- 
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev