[cfe-dev] Ping*2: RFC: Adding vscale vector types to C and C++
Richard Sandiford via cfe-dev
cfe-dev at lists.llvm.org
Fri Jun 28 07:25:29 PDT 2019
Ping*2.
Quick summary of the patches backing the RFC:
* https://reviews.llvm.org/D62960
Add the SVE types themselves. Thanks for the reviews on this one!
* https://reviews.llvm.org/D62961
[AST] Add type queries for scalable types.
* https://reviews.llvm.org/D62962
Main patch, including documentation & tests. Mostly affects Sema.
Richard Sandiford <richard.sandiford at arm.com> writes:
> LLVM now supports a "scalable" vector type:
>
> <vscale x N x ELT> (e.g. <vscale x 4 x i32>)
>
> that represents a vector of X*N ELTs for some runtime value X
> [https://reviews.llvm.org/D32530]. The number of elements is therefore
> not known at compile time and can depend on choices made by the execution
> environment. This RFC is about how we can provide C and C++ types that
> map to this LLVM type.
>
> The main complication is that, because the number of elements isn't
> known at compile time, "sizeof" can't work in the same way as it does
> for normal vector types. Our suggested fix for this is to separate the
> concept of "complete type" into two:
>
> * does the type have enough information to construct objects of that type?
>
> For want of a better term, types that have this property are
> "definite" while types that don't are "indefinite".
>
> * will it be possible to measure the size of the type using "sizeof",
> once the type is definite?
>
> If so, the type is "sized", otherwise it is "sizeless".
>
> "Complete" is then equivalent to "sized and definite". The new scalable
> vectors are definite but sizeless, and so are never complete.
>
> We can then redefine certain rules to use the distinction between
> definite and indefinite types rather than complete and incomplete types.
> (This is a simple change to make in Clang.) Things like "sizeof" and
> pointer arithmetic continue to require complete types, and so are invalid
> for the new types. See below for a more detailed description.
>
> We're also proposing to treat the new C and C++ types as opaque built-in
> types rather than first-class vector types, for two reasons:
>
> (1) It means that we don't need to define what the "vscale" is for
> all targets, or emulate general vscale operations for all targets.
> We can just provide the types that the target supports natively,
> and for which the target already has a defined ABI.
>
> (2) It allows for more abstraction. For example, SVE has scalable types
> that are logically tuples of 2, 3 or 4 vectors. Defining them as opaque
> built-in types means that we don't need to treat them as single vectors
> in C and C++, even if that happens to be how LLVM represents them.
> Building tuple types into the compiler also means that we don't need
> to support scalable vectors in structures or arrays.
>
> In case this looks familiar...
> ==============================
>
> This is a refresh of an RFC I sent out last year
> [http://lists.llvm.org/pipermail/cfe-dev/2018-May/057830.html].
> The details are basically the same, except that we're no longer
> proposing to support user-defined sizeless types. The reason for
> sending the RFC again is that (unlike last time) LLVM does now support
> the underlying scalable vectors. The patches are therefore less
> speculative than they were before.
>
> Those on WG21 might also remember that sizeless types were used as
> a possible basis for a proposal to make P0214 support scalable vectors
> [http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1101r0.html].
> It was clear from the committee meeting that modifying P0214 in this
> way wasn't acceptable and this message isn't an attempt to revive
> that discussion. All we're trying to do with this RFC is make
> Clang support opaque built-in types that map to LLVM vscale types.
> (In particular, there's no __sizeless_struct, or any other attempt
> to support aggregates of sizeless types.)
>
> Why the extension is needed
> ===========================
>
> We need these scalable types in the AArch64 port so that we can provide
> low-level access to the SVE and SVE2 vector extensions. More information
> on the extensions is available here:
>
> https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/new-technologies-for-the-arm-a-profile-architecture
>
> but the only feature that really matters for this RFC is that they have
> no fixed or preferred vector length. Processors that implement SVE can
> instead choose from a range of possible vector lengths. This means that
> in many environments, the actual vector length is only known at runtime.
>
> SVE has been designed so that one piece of "length-agnostic" code can
> work for all vector lengths. The new scalable types provide the basis
> for writing such code in C and C++. Specifically:
>
> * As with other vector architectures, it's possible to pass and return
> vectors in registers when calling other functions. This is particularly
> useful for things like vector libm routines. We need a C and C++
> representation of the vector types in order to write such functions.
>
> * Again as for other vector architectures, we have a set of intrinsic
> functions that provide low-level access to the architecture, as a
> last line of defence before dropping to assembly. This again needs
> scalable types that can hold temporary working data and that can be
> passed to and returned from intrinsic functions.
>
> Using intrinsics might seem old-fashioned when there are various
> frameworks that express data-parallel algorithms in a more abstract way,
> or libraries like P0214 (std::simd) that provide mostly performance-
> portable vector interfaces. But in practice, each vector architecture
> has its own quirks and unique features that aren't easy for the compiler
> to use automatically and aren't performance-portable enough to be part
> of a generic interface. So even though target-neutral approaches are a
> very welcome development, they're not a complete solution. Intrinsics
> are still vital when you really want to hand-optimise a routine for a
> particular architecture. And that's still a common requirement.
>
> For example, Arm has been porting various codebases that already support
> AArch64 AdvSIMD intrinsics to SVE2. Even though AdvSIMD and SVE2 have
> some features in common, the routines for the two architectures are
> often significantly different from each other (and in ways that can't be
> abstracted by interfaces like std::simd). We need to have direct access
> to SVE2 features for this kind of work.
>
> Implementation
> ==============
>
> I've uploaded a Clang implementation to Phabricator. There are three parts:
>
> https://reviews.llvm.org/D62960
>
> Adds some SVE types that can be used to test the next two patches.
> This is a respin of Graham's patch [https://reviews.llvm.org/D59245]
> with some minor updates.
>
> The patch isn't really part of the RFC, but if you have any
> comments about defining the types this way, please let us know!
>
> https://reviews.llvm.org/D62961
>
> Adds new type queries isSizeless and isIndefinite.
>
> https://reviews.llvm.org/D62962
>
> The Clang support itself, including documentation and testcases.
>
> Criteria for clang extensions
> =============================
>
>> From the list on [http://clang.llvm.org/get_involved.html],
> an extension needs:
>
> (1) Evidence of a significant user community
>
> The extension allows SVE intrinsics to be used in places that
> currently use intrinsics for other vector architectures. There is
> already one public project that uses the SVE intrinsics[1] and one
> that specifically considered SVE support as part of its design
> philosophy[2]. Arm has patches to add SVE and SVE2 support to
> several other projects, but they're gated on the Clang support.
>
> [1] https://github.com/nmeyer-ur/Grid
> [2] https://github.com/google/pik/tree/master/pik/simd
>
> (2) A specific need to reside within the Clang tree
>
> The extension involves (small) changes to the core type system.
> It's also part of supporting target-specific intrinsics, which
> would normally be part of Clang even without the scalable type
> aspect.
>
> (3) A complete specification
>
> See the documentation and language edits in the patch for
> the specification (also copied below for inline replies).
>
> (4) Representation within the appropriate governing organization
>
> It doesn't seem appropriate to try to standardise the extension
> at this stage, since the only way to use the extension is through
> target-specific interfaces. The extension doesn't provide any
> benefit that's independent of those interfaces.
>
> So at the moment this is really in the realm of target-specific
> language extensions rather than generic language extensions.
> This may of course change later.
>
> (5) A long-term support plan
>
> Arm is very much committed to supporting this.
>
> (6) A high-quality implementation
>
> I'd like feedback on whether the current patch qualifies. :-)
>
> (7) A proper test suite
>
> The tests in the patch cover each functional change to the source,
> except as noted in the patch description. The implementation of the
> SVE ACLE will provide further coverage.
>
> Following a suggestion from Renato in a different context, I've now
> put the main discussion and justification in the documentation part
> of the patch. I've copied it below as well for inline replies.
>
> Thanks,
> Richard
>
>
>
> ==============
> Sizeless types
> ==============
>
> As an extension, Clang supports the concept of “sizeless” object types in
> both C and C++. The types are so called because it is an error to measure
> their size directly using ``sizeof`` or indirectly via operations like
> pointer arithmetic.
>
> Forbidding ``sizeof`` and related operations means that the amount of
> data that the types contain does not need to be a compile-time constant.
> It can instead depend on runtime properties, and for example can adapt
> to different hardware configurations.
>
> Sizeless types are only intended for objects that hold temporary working
> data, such as “scalable” or variable-length vectors. They are not
> intended for long-term storage and cannot be used in aggregates.
>
> At present, the only sizeless types that Clang provides are:
>
> AArch64 SVE vector types
> These vector types are built into the compiler under names like
> ``__SVInt8_t``, as required by the `Procedure Call Standard for the
> Arm® 64-bit Architecture`_. They represent the longest vector of a
> particular element type that can be stored in an SVE vector register.
> Functions can pass and return these vectors in registers.
>
> The header file ``<arm_sve.h>`` makes the types available under more
> user-friendly names like ``svint8_t``. It also provides a set of
> intrinsic functions for operating on the types. See the `ARM C
> Language Extensions for SVE`_ for more information about these types
> and intrinsics.
>
> .. _Procedure Call Standard for the Arm® 64-bit Architecture:
> https://developer.arm.com/docs/ihi0055/latest/
> .. _ARM C Language Extensions for SVE:
> https://developer.arm.com/docs/100987/latest
>
> `ARM C Language Extensions for SVE`_ contains the original specification of
> sizeless types, but the description below is intended to be self-contained.
>
> Outline of the type system changes
> ==================================
>
> C and C++ classify object types as “complete” (the size of objects
> of that type can be calculated) or “incomplete” (the size of objects
> of that type cannot be calculated). There is very little you can do with
> a type until it becomes complete.
>
> This categorization implicitly ties two concepts: whether it is possible
> to manipulate objects of a particular type, and whether it is possible
> to measure their size (which in C++ must be constant). The key idea
> behind the sizeless type extension is to split these concepts apart.
>
> To do this, the extension classifies types as:
>
> * “indefinite” (lacking sufficient information to create an object of
> that type) or “definite” (having sufficient information)
>
> * “sized” (will have a measurable size when definite) or “sizeless”
> (will never have a measurable size)
>
> * “incomplete” (lacking sufficient information to determine the size of
> objects of that type) or “complete” (having sufficient information)
>
> where the wording for the final bullet is taken verbatim from the
> C standard. All standard types are “sized” (even ``void``, although
> it is always indefinite).
>
> The idea is that “definite” types are as fully-defined as they
> ever can be, even if their size is still not known at compile time.
> “Complete” is then equivalent to “sized and definite”.
>
> On its own, this puts sizeless types into a similar position
> to incomplete structure types, which is conservatively correct
> but severely limits what the types can do.
>
> The next step is to relax certain rules so that they use the distinction
> between “indefinite” and “definite” rather than “incomplete” and “complete”.
> The goal of this process is to allow:
>
> * automatic variables with sizeless type
> * function parameters and return values with sizeless type
> * use of sizeless types with ``_Generic``
> * pointers to sizeless types
> * applying ``typeid`` to a sizeless type
> * use of sizeless types with C++ type traits
>
> In contrast, the following must remain invalid, by keeping the usual rules
> for incomplete types unchanged:
>
> * using ``sizeof``, ``_Alignof`` and ``alignof`` with a sizeless type
> (or object of sizeless type)
> * creating or accessing arrays that have sizeless type
> * doing pointer arithmetic on pointers to sizeless types
> * unions or structures with sizeless members
> * applying ``_Atomic`` to a sizeless type
> * throwing or catching objects of sizeless type
> * capturing sizeless objects by value in lambda expressions
>
> There is also an extra restriction:
>
> * variables with sizeless type must not have static or thread-local
> storage duration
>
> In practice it is impossible to *define* such variables with incomplete type,
> but having an explicit rule means that things like:
>
> .. code-block:: c
>
> extern __SVInt8_t foo;
>
> are outright invalid rather than simply useless (because no other
> translation unit could ever define ``foo``). Similarly, without an
> explicit rule:
>
> .. code-block:: c
>
> __SVInt8_t foo;
>
> would be a valid tentative definition at the point it occurs and only
> become invalid at the end of the translation unit, because ``__SVInt8_t``
> is never completed.
>
> Edits to the standards
> ======================
>
> Edits to the C standard
> -----------------------
>
> This section specifies the behavior for sizeless types in C, as an edit
> to the N1570 draft of C11.
>
> 6.2.5 Types
> ~~~~~~~~~~~
>
> In 6.2.5p1, replace:
>
> At various points within a translation unit an object type may be
> *incomplete* …
>
> onwards with:
>
> Object types are further partitioned into *sized* and *sizeless*; all
> basic and derived types defined in this standard are sized, but an
> implementation may provide additional sizeless types.
>
> and add two additional clauses:
>
> * At various points within a translation unit an object type may be
> *indefinite* (lacking sufficient information to construct an object
> of that type) or *definite* (having sufficient information).
> An object type is said to be *complete* if it is both sized and
> definite; all other object types are said to be *incomplete*.
> Complete types have sufficient information to determine the size
> of an object of that type while incomplete types do not.
>
> * Arrays, structures, unions and enumerated types are always sized,
> so for them the term *incomplete* is equivalent to (and used
> interchangeably with) the term *indefinite*.
>
> Change 6.2.5p19 to:
>
> The void type comprises an empty set of values; it is a sized
> indefinite object type that cannot be completed (made definite).
>
> Replace “incomplete” with “indefinite” and “complete” with “definite” in
> 6.2.5p37, which describes how a type's state can change throughout a
> translation unit.
>
> 6.3.2.1 Lvalues, arrays, and function designators
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Replace “incomplete” with “indefinite” in 6.3.2.1p1, so that sizeless
> definite types are modifiable lvalues.
>
> Make the same replacement in 6.3.2.1p2, to prevent undefined behavior
> when lvalues have sizeless definite type.
>
> 6.5.1.1 Generic selection
> ~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Replace “complete object type” with “definite object type” in 6.5.1.1p2,
> so that the type name in a generic association can be a sizeless definite
> type.
>
> 6.5.2.2 Function calls
> ~~~~~~~~~~~~~~~~~~~~~~
>
> Replace “complete object type” with “definite object type” in 6.5.2.2p1,
> so that functions can return sizeless definite types.
>
> Make the same change in 6.5.2.2p4, so that arguments can also have
> sizeless definite type.
>
> 6.5.2.5 Compound literals
> ~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Replace “complete object type” with “definite object type” in 6.5.2.5p1,
> so that compound literals can have sizeless definite type.
>
> 6.7 Declarations
> ~~~~~~~~~~~~~~~~
>
> Insert the following new clause after 6.7p4:
>
> * If an identifier for an object does not have automatic storage duration,
> its type must be sized rather than sizeless.
>
> Replace “complete” with “definite” in 6.7p7, which describes when the
> type of an object becomes definite.
>
> 6.7.6.3 Function declarators (including prototypes)
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Replace “incomplete type” with “indefinite type” in 6.7.6.3p4, so that
> parameters can also have sizeless definite type.
>
> Make the same change in 6.7.6.3p12, which allows even indefinite types
> to be function parameters if no function definition is present.
>
> 6.7.9 Initialization
> ~~~~~~~~~~~~~~~~~~~~
>
> Replace “complete object type” with “definite object type” in 6.7.9p3,
> to allow initialization of identifiers with sizeless definite type.
>
> 6.9.1 Function definitions
> ~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Replace “complete object type” with “definite object type” in 6.9.1p3,
> so that functions can return sizeless definite types.
>
> Make the same change in 6.9.1p7, so that adjusted parameter types can be
> sizeless definite types.
>
> J.2 Undefined behavior
> ~~~~~~~~~~~~~~~~~~~~~~
>
> Update the entries that refer to the clauses above.
>
> Edits to the C++ standard
> -------------------------
>
> This section specifies the behavior for sizeless types in C++,
> as an edit to the N3797 draft of C++17.
>
> 3.1 Declarations and definitions [basic.def]
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Replace “incomplete” with “indefinite” in [basic.def]p5, so that definitions
> of an object can give it sizeless definite type. Add a further clause
> after [basic.def]p5:
>
> * A program is ill-formed if any declaration of an object gives it both
> a sizeless type and either static or thread-local storage duration.
>
> 3.9 Types [basic.types]
> ~~~~~~~~~~~~~~~~~~~~~~~
>
> Replace [basic.types]p5 with:
>
> A class that has been declared but not defined, an enumeration type
> in certain contexts (7.2), or an array of unknown size or of
> indefinite element type, is an indefinite object type.45)
> Indefinite object types and the void types are indefinite types (3.9.1).
> Objects shall not be defined to have an indefinite type.
>
> and add three additional clauses:
>
> * Object and void types are further partitioned into *sized* and *sizeless*;
> all basic and derived types defined in this standard are sized, but an
> implementation may provide additional sizeless types.
>
> * An object or void type is said to be *complete* if it is both sized and
> definite; all other object and void types are said to be *incomplete*.
> The term *completely-defined object type* is synonymous with *complete
> object type*.
>
> * Arrays, class types and enumeration types are always sized, so for
> them the term *incomplete* is equivalent to (and used interchangeably
> with) the term *indefinite*.
>
> (Note that the wording of footnote 45 continues to apply as-is.)
>
> Also replace “incomplete” with “indefinite” in the forward reference
> in [basic.types]p7.
>
> 3.9.1 Fundamental Types [basic.fundamental]
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> In [basic.fundamental]p9, replace the second sentence with:
>
> The void type is a sized indefinite type that cannot be completed
> (made definite).
>
> leaving the rest of the clause unchanged.
>
> 3.9.2. Compound Types [basic.compound]
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> In this part of [basic.compound]p3:
>
> Pointers to incomplete types are allowed although there are
> restrictions on what can be done with them …
>
> add “(including indefinite types)” after “incomplete types”.
>
> 3.10 Lvalues and rvalues [basic.lval]
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Replace “complete” with “definite” and “incomplete” with “indefinite” in
> [basic.lval]p4, so that prvalues can have definite type and (in contrast)
> glvalues can have indefinite type.
>
> Replace “incomplete” with “indefinite” and “complete” with “definite” in
> [basic.lval]p7, so that the target of a pointer can be modifiable if it has
> sizeless definite type.
>
> 4.1 Lvalue-to-rvalue conversion [conv.lval]
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Replace “incomplete” with “indefinite” in [conv.lval]p1, so that sizeless
> definite glvalues can be converted to prvalues.
>
> 5.2.2 Function call [expr.call]
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Replace “completely-defined” with “definite” and “incomplete class type” with
> “indefinite type” in [expr.call]p4, so that parameters can have sizeless
> definite type.
>
> Replace “incomplete” with “indefinite” and “complete” with “definite” in
> [expr.call]p11, so that function call prvalues can have sizeless definite type.
>
> 5.2.3 Explicit type conversion (function notation) [expr.type.conv]
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Replace “complete” with “definite” in [expr.type.conv]p2, so that ``T()``
> can be used for sizeless definite T.
>
> 5.3.1 Unary operators [expr.unary.op]
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Replace “incomplete” with “indefinite” in [expr.unary.op]p1, so that a
> dereferenced pointer to a sizeless definite object can be converted to
> a prvalue.
>
> 5.3.5 Delete [expr.delete]
> ~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> After the first sentence in [expr.delete]p2 (which describes converting an
> operand with class type to a pointer type), add:
>
> The type of the operand must now be a pointer to a sized type,
> otherwise the program is ill-formed.
>
> 7.1.6.2 Simple type specifiers [dcl.type.simple]
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Replace “complete” with “definite” in [dcl.type.simple]p5, so that the special
> treatment for decltypes of function calls applies to indefinite rather
> than incomplete return types. This is for consistency with the change
> to [expr.call]p11 above.
>
> 8.3.4 Arrays [dcl.array]
> ~~~~~~~~~~~~~~~~~~~~~~~~
>
> In [dcl.array]p1, add “a sizeless type” to the list of things that array
> element type T cannot be.
>
> 9.4.2 Static data members [class.static.data]
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Replace “an incomplete type” with “a sized indefinite type” in
> [class.static.data]p2, to avoid giving the impression that static data
> members can have sizeless type.
>
> Make this explicit by adding the following after [class.static.data]p7:
>
> * A static data member shall not have sizeless type.
>
> 14.3.1 Template type parameters [temp.arg.type]
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Replace “incomplete” with “indefinite” in [temp.arg.type]p2, which notes that
> template type parameters need not be fully defined.
>
> 14.7.1 Implicit instantiation [temp.inst]
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Replace “completely-defined object type” with “definite object type”
> in [temp.inst]p1 and [temp.inst]p6, so that the language edits do not affect
> the rules for implicit instantiation.
>
> 17.6.4.8 Other functions [res.on.functions]
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Replace “incomplete” with “incomplete or indefinite” in [res.on.functions]p2,
> so that the library requires the rest of the program to honor the rules
> for both categories of type.
>
> 20.10.4.3 Type properties [meta.unary.prop]
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Replace “complete” with “definite” in [meta.unary.prop]p3 and in the table
> that follows. This specifically includes ``is_destructible``; since sizeless
> definite types can have automatic storage duration, it must be possible
> to destroy them. The changes are redundant but harmless for cases in
> which the completeness rule applies only to class types.
>
> 20.10.6 Relationships between types [meta.rel]
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Replace “complete” with “definite” in table 51.
>
> 20.10.7.6 Other transformations [meta.trans.other]
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Replace “complete” with “definite” in table 57.
>
> Notes for Clang developers
> ==========================
>
> Applying the extension to other cases
> -------------------------------------
>
> The summary and standard edits above describe how the sizeless type
> extension interacts with the core parts of the C and C++ standards.
> However, Clang supports many other extensions to the core languages,
> and will support new versions of the core languages as they evolve
> over time. It is therefore necessary to describe how sizeless types
> should interact with future extensions and language developments.
>
> The general principle is that we should continue to keep using the
> distinction between incomplete types and complete types unless there is
> a specific known benefit to doing otherwise. Treating sizeless types as
> incomplete types should be the conservatively correct choice in almost
> all cases. We can later decide to relax specific rules to use the
> distinction between indefinite and definite types once we are sure
> that that is the right thing to do.
>
> Note that no decision needs to be made for any rules that are specific
> to complete or incomplete aggregates (arrays, structs, unions or classes),
> since aggregates are always sized.
>
> Rationale for this extension
> ============================
>
> Requirements
> ------------
>
> The main question that prompted this extension was: how do we add
> scalable vector types to the type system? The key requirements were:
>
> * The approach must work in both C and C++.
>
> * It must be possible to define automatic variables with these types.
>
> * It must be possible to pass and return objects of these types
> (since that is what intrinsics and vector library routines need to do).
>
> * It must be possible to use the types in ``_Generic`` associations
> (since the SVE ACLE uses ``_Generic`` to provide ``tgmath.h``\ -style
> overloads).
>
> * It must be possible to create pointers or references to the types
> (for passing or returning by pointer or reference, and because not
> allowing references would be semantically difficult in C++).
>
> Possible approaches
> -------------------
>
> Any approach to defining scalable types would fall into one of three
> categories:
>
> (1) Limit the types in such a way that there is no concept of size.
>
> (2) Define the size of the types to be variable.
>
> (3) Define the size of the types to be constant, either with the
> constant being large enough for all possible vector lengths or
> with the types pointing to separate memory (as for C++ classes
> like ``std::string``).
>
> \ (2) seemed initially appealing since C already has the concept of
> variable-length arrays. However, variable-length built-in types
> would work in a significantly different way. Arrays often decay to
> pointers (which of course are fixed-length types), whereas vector
> types never would. Unlike arrays, it should be possible to pass
> variable-length vectors to functions, return them from functions,
> and assign them by value.
>
> One particular difficulty is that the semantics of variable-length arrays
> rely on having a point at which the array size is evaluated. It would
> be difficult to extend this approach to built-in types, or to declarations
> of functions that return variable-length types. It would also not be an
> accurate model of how an implementation actually behaves, since the
> implementation would not evaluate the vector lengths at these points and
> would not react to the results of the calculation.
>
> As well as the extension itself being relatively complex (especially
> for C++), it might be difficult to define it in a way that interacts
> naturally with other extensions. Also, variable-length arrays were added
> to an early draft of C++14, but were later removed as too controversial and
> did not make it into the final standard. C++17 still requires ``sizeof``
> to be constant and C11 makes variable-length arrays optional.
>
> \ (2) therefore felt like a complicated dead-end.
>
> \ (3) can be divided into two parts:
>
> a) The vector types have a constant size and are large enough for all
> possible vector lengths.
>
> The main problem with this approach is that the maximum SVE vector
> length of 2048 bits is much larger than the minimum of 128 bits. Using
> a fixed size of 2048 bits would be extremely inefficient for smaller
> vector lengths, and of course the whole point of using vectors is to
> make things *more* efficient.
>
> Also, we would need to define the types such that only the bytes
> associated with the actual vector length are significant. This would
> make it possible to pass or return the types in registers and treat
> them as register values when copying. This perhaps has some similarity
> with overaligned structures such as:
>
> .. code-block:: c
>
> struct s { _Alignas(16) int i; };
>
> except that the amount of padding is only known at runtime.
>
> There is also a significant conceptual problem: encoding a fixed size
> goes against the guiding principle of SVE, in which there is no preferred
> vector length. There is nothing particularly magical about the current
> limit of 2048 bits and it would be better to avoid an ABI break if the
> maximum ever did increase in future.
>
> b) The vector types have a constant size and refer to separate storage
> (as for C++ classes like ``std::string``).
>
> This would be difficult to do without C++-style constructor, destructor,
> copy and move semantics, so would not work well in C. And in C++ it would
> be less efficient than the other approaches, since presumably an allocator
> would be needed to allocate the separate storage. It would be difficult
> to map this kind of type to a self-contained register-based ABI type.
>
> These are all negative reasons for (1) being the best approach.
> A more positive justification is that (1) seems to meet the requirements
> in the most efficient way possible. The vectors can use their natural
> (native) representation, and the type system prevents uses that would
> make that representation problematic.
>
> Also, the approach of starting with very restricted types and then
> specifically allowing certain things should be more future-proof
> and interact better with other (unseen) language extensions. By default,
> any language extension would treat the new types like other incomplete
> types and choose conservatively-correct behavior. It would then be
> possible to relax the rules if this default behavior turns out to be
> too restrictive.
>
> (That said, treating the types as permanently incomplete will
> not avoid all clashes with other extensions. For example, we need to
> allow objects of automatic storage duration to have certain forms of
> incomplete type, whereas an extension might implicitly assume that all
> such objects must already have complete type. The approach should still
> avoid the worst effects though.)
More information about the cfe-dev
mailing list