[cfe-dev] RFC: Adding vscale vector types to C and C++

Richard Sandiford via cfe-dev cfe-dev at lists.llvm.org
Thu Jun 6 08:55:17 PDT 2019

LLVM now supports a "scalable" vector type:

    <vscale x N x ELT>      (e.g. <vscale x 4 x i32>)

that represents a vector of X*N ELTs for some runtime value X
[https://reviews.llvm.org/D32530].  The number of elements is therefore
not known at compile time and can depend on choices made by the execution
environment.  This RFC is about how we can provide C and C++ types that
map to this LLVM type.

The main complication is that, because the number of elements isn't
known at compile time, "sizeof" can't work in the same way as it does
for normal vector types.  Our suggested fix for this is to separate the
concept of "complete type" into two:

* does the type have enough information to construct objects of that type?

    For want of a better term, types that have this property are
    "definite" while types that don't are "indefinite".

* will it be possible to measure the size of the type using "sizeof",
  once the type is definite?

    If so, the type is "sized", otherwise it is "sizeless".

"Complete" is then equivalent to "sized and definite".  The new scalable
vectors are definite but sizeless, and so are never complete.

We can then redefine certain rules to use the distinction between
definite and indefinite types rather than complete and incomplete types.
(This is a simple change to make in Clang.)  Things like "sizeof" and
pointer arithmetic continue to require complete types, and so are invalid
for the new types.  See below for a more detailed description.

We're also proposing to treat the new C and C++ types as opaque built-in
types rather than first-class vector types, for two reasons:

(1) It means that we don't need to define what the "vscale" is for
    all targets, or emulate general vscale operations for all targets.
    We can just provide the types that the target supports natively,
    and for which the target already has a defined ABI.

(2) It allows for more abstraction.  For example, SVE has scalable types
    that are logically tuples of 2, 3 or 4 vectors.  Defining them as opaque
    built-in types means that we don't need to treat them as single vectors
    in C and C++, even if that happens to be how LLVM represents them.
    Building tuple types into the compiler also means that we don't need
    to support scalable vectors in structures or arrays.

In case this looks familiar...

This is a refresh of an RFC I sent out last year
The details are basically the same, except that we're no longer
proposing to support user-defined sizeless types.  The reason for
sending the RFC again is that (unlike last time) LLVM does now support
the underlying scalable vectors.  The patches are therefore less
speculative than they were before.

Those on WG21 might also remember that sizeless types were used as
a possible basis for a proposal to make P0214 support scalable vectors
It was clear from the committee meeting that modifying P0214 in this
way wasn't acceptable and this message isn't an attempt to revive
that discussion.  All we're trying to do with this RFC is make
Clang support opaque built-in types that map to LLVM vscale types.
(In particular, there's no __sizeless_struct, or any other attempt
to support aggregates of sizeless types.)

Why the extension is needed

We need these scalable types in the AArch64 port so that we can provide
low-level access to the SVE and SVE2 vector extensions.  More information
on the extensions is available here:


but the only feature that really matters for this RFC is that they have
no fixed or preferred vector length.  Processors that implement SVE can
instead choose from a range of possible vector lengths.  This means that
in many environments, the actual vector length is only known at runtime.

SVE has been designed so that one piece of "length-agnostic" code can
work for all vector lengths.  The new scalable types provide the basis
for writing such code in C and C++.  Specifically:

* As with other vector architectures, it's possible to pass and return
  vectors in registers when calling other functions.  This is particularly
  useful for things like vector libm routines.  We need a C and C++
  representation of the vector types in order to write such functions.

* Again as for other vector architectures, we have a set of intrinsic
  functions that provide low-level access to the architecture, as a
  last line of defence before dropping to assembly.  This again needs
  scalable types that can hold temporary working data and that can be
  passed to and returned from intrinsic functions.

Using intrinsics might seem old-fashioned when there are various
frameworks that express data-parallel algorithms in a more abstract way,
or libraries like P0214 (std::simd) that provide mostly performance-
portable vector interfaces.  But in practice, each vector architecture
has its own quirks and unique features that aren't easy for the compiler
to use automatically and aren't performance-portable enough to be part
of a generic interface.  So even though target-neutral approaches are a
very welcome development, they're not a complete solution.  Intrinsics
are still vital when you really want to hand-optimise a routine for a
particular architecture.  And that's still a common requirement.

For example, Arm has been porting various codebases that already support
AArch64 AdvSIMD intrinsics to SVE2.  Even though AdvSIMD and SVE2 have
some features in common, the routines for the two architectures are
often significantly different from each other (and in ways that can't be
abstracted by interfaces like std::simd).  We need to have direct access
to SVE2 features for this kind of work.


I've uploaded a Clang implementation to Phabricator.  There are three parts:


    Adds some SVE types that can be used to test the next two patches.
    This is a respin of Graham's patch [https://reviews.llvm.org/D59245]
    with some minor updates.

    The patch isn't really part of the RFC, but if you have any
    comments about defining the types this way, please let us know!


    Adds new type queries isSizeless and isIndefinite.


    The Clang support itself, including documentation and testcases.

Criteria for clang extensions

>From the list on [http://clang.llvm.org/get_involved.html],
an extension needs:

(1) Evidence of a significant user community

      The extension allows SVE intrinsics to be used in places that
      currently use intrinsics for other vector architectures.  There is
      already one public project that uses the SVE intrinsics[1] and one
      that specifically considered SVE support as part of its design
      philosophy[2].  Arm has patches to add SVE and SVE2 support to
      several other projects, but they're gated on the Clang support.

      [1] https://github.com/nmeyer-ur/Grid
      [2] https://github.com/google/pik/tree/master/pik/simd

(2) A specific need to reside within the Clang tree

      The extension involves (small) changes to the core type system.
      It's also part of supporting target-specific intrinsics, which
      would normally be part of Clang even without the scalable type

(3) A complete specification

      See the documentation and language edits in the patch for
      the specification (also copied below for inline replies).

(4) Representation within the appropriate governing organization

      It doesn't seem appropriate to try to standardise the extension
      at this stage, since the only way to use the extension is through
      target-specific interfaces.  The extension doesn't provide any
      benefit that's independent of those interfaces.

      So at the moment this is really in the realm of target-specific
      language extensions rather than generic language extensions.
      This may of course change later.

(5) A long-term support plan

      Arm is very much committed to supporting this.

(6) A high-quality implementation

      I'd like feedback on whether the current patch qualifies. :-)

(7) A proper test suite

      The tests in the patch cover each functional change to the source,
      except as noted in the patch description.  The implementation of the
      SVE ACLE will provide further coverage.

Following a suggestion from Renato in a different context, I've now
put the main discussion and justification in the documentation part
of the patch.  I've copied it below as well for inline replies.


Sizeless types

As an extension, Clang supports the concept of “sizeless” object types in
both C and C++.  The types are so called because it is an error to measure
their size directly using ``sizeof`` or indirectly via operations like
pointer arithmetic.

Forbidding ``sizeof`` and related operations means that the amount of
data that the types contain does not need to be a compile-time constant.
It can instead depend on runtime properties, and for example can adapt
to different hardware configurations.

Sizeless types are only intended for objects that hold temporary working
data, such as “scalable” or variable-length vectors.  They are not
intended for long-term storage and cannot be used in aggregates.

At present, the only sizeless types that Clang provides are:

AArch64 SVE vector types
  These vector types are built into the compiler under names like
  ``__SVInt8_t``, as required by the `Procedure Call Standard for the
  Arm® 64-bit Architecture`_.  They represent the longest vector of a
  particular element type that can be stored in an SVE vector register.
  Functions can pass and return these vectors in registers.

  The header file ``<arm_sve.h>`` makes the types available under more
  user-friendly names like ``svint8_t``.  It also provides a set of
  intrinsic functions for operating on the types.  See the `ARM C
  Language Extensions for SVE`_ for more information about these types
  and intrinsics.

  .. _Procedure Call Standard for the Arm® 64-bit Architecture:
  .. _ARM C Language Extensions for SVE:

`ARM C Language Extensions for SVE`_ contains the original specification of
sizeless types, but the description below is intended to be self-contained.

Outline of the type system changes

C and C++ classify object types as “complete” (the size of objects
of that type can be calculated) or “incomplete” (the size of objects
of that type cannot be calculated).  There is very little you can do with
a type until it becomes complete.

This categorization implicitly ties two concepts: whether it is possible
to manipulate objects of a particular type, and whether it is possible
to measure their size (which in C++ must be constant).  The key idea
behind the sizeless type extension is to split these concepts apart.

To do this, the extension classifies types as:

* “indefinite” (lacking sufficient information to create an object of
  that type) or “definite” (having sufficient information)

* “sized” (will have a measurable size when definite) or “sizeless”
  (will never have a measurable size)

* “incomplete” (lacking sufficient information to determine the size of
  objects of that type) or “complete” (having sufficient information)

where the wording for the final bullet is taken verbatim from the
C standard.  All standard types are “sized” (even ``void``, although
it is always indefinite).

The idea is that “definite” types are as fully-defined as they
ever can be, even if their size is still not known at compile time.
“Complete” is then equivalent to “sized and definite”.

On its own, this puts sizeless types into a similar position
to incomplete structure types, which is conservatively correct
but severely limits what the types can do.

The next step is to relax certain rules so that they use the distinction
between “indefinite” and “definite” rather than “incomplete” and “complete”.
The goal of this process is to allow:

* automatic variables with sizeless type
* function parameters and return values with sizeless type
* use of sizeless types with ``_Generic``
* pointers to sizeless types
* applying ``typeid`` to a sizeless type
* use of sizeless types with C++ type traits

In contrast, the following must remain invalid, by keeping the usual rules
for incomplete types unchanged:

* using ``sizeof``, ``_Alignof`` and ``alignof`` with a sizeless type
  (or object of sizeless type)
* creating or accessing arrays that have sizeless type
* doing pointer arithmetic on pointers to sizeless types
* unions or structures with sizeless members
* applying ``_Atomic`` to a sizeless type
* throwing or catching objects of sizeless type
* capturing sizeless objects by value in lambda expressions

There is also an extra restriction:

* variables with sizeless type must not have static or thread-local
  storage duration

In practice it is impossible to *define* such variables with incomplete type,
but having an explicit rule means that things like:

.. code-block:: c

   extern __SVInt8_t foo;

are outright invalid rather than simply useless (because no other
translation unit could ever define ``foo``).  Similarly, without an
explicit rule:

.. code-block:: c

   __SVInt8_t foo;

would be a valid tentative definition at the point it occurs and only
become invalid at the end of the translation unit, because ``__SVInt8_t``
is never completed.

Edits to the standards

Edits to the C standard

This section specifies the behavior for sizeless types in C, as an edit
to the N1570 draft of C11.

6.2.5 Types

In 6.2.5p1, replace:

    At various points within a translation unit an object type may be
    *incomplete* …

onwards with:

    Object types are further partitioned into *sized* and *sizeless*; all
    basic and derived types defined in this standard are sized, but an
    implementation may provide additional sizeless types.

and add two additional clauses:

* At various points within a translation unit an object type may be
  *indefinite* (lacking sufficient information to construct an object
  of that type) or *definite* (having sufficient information).
  An object type is said to be *complete* if it is both sized and
  definite; all other object types are said to be *incomplete*.
  Complete types have sufficient information to determine the size
  of an object of that type while incomplete types do not.

* Arrays, structures, unions and enumerated types are always sized,
  so for them the term *incomplete* is equivalent to (and used
  interchangeably with) the term *indefinite*.

Change 6.2.5p19 to:

    The void type comprises an empty set of values; it is a sized
    indefinite object type that cannot be completed (made definite).

Replace “incomplete” with “indefinite” and “complete” with “definite” in
6.2.5p37, which describes how a type's state can change throughout a
translation unit. Lvalues, arrays, and function designators

Replace “incomplete” with “indefinite” in, so that sizeless
definite types are modifiable lvalues.

Make the same replacement in, to prevent undefined behavior
when lvalues have sizeless definite type. Generic selection

Replace “complete object type” with “definite object type” in,
so that the type name in a generic association can be a sizeless definite
type. Function calls

Replace “complete object type” with “definite object type” in,
so that functions can return sizeless definite types.

Make the same change in, so that arguments can also have
sizeless definite type. Compound literals

Replace “complete object type” with “definite object type” in,
so that compound literals can have sizeless definite type.

6.7 Declarations

Insert the following new clause after 6.7p4:

* If an identifier for an object does not have automatic storage duration,
  its type must be sized rather than sizeless.

Replace “complete” with “definite” in 6.7p7, which describes when the
type of an object becomes definite. Function declarators (including prototypes)

Replace “incomplete type” with “indefinite type” in, so that
parameters can also have sizeless definite type.

Make the same change in, which allows even indefinite types
to be function parameters if no function definition is present.

6.7.9 Initialization

Replace “complete object type” with “definite object type” in 6.7.9p3,
to allow initialization of identifiers with sizeless definite type.

6.9.1 Function definitions

Replace “complete object type” with “definite object type” in 6.9.1p3,
so that functions can return sizeless definite types.

Make the same change in 6.9.1p7, so that adjusted parameter types can be
sizeless definite types.

J.2 Undefined behavior

Update the entries that refer to the clauses above.

Edits to the C++ standard

This section specifies the behavior for sizeless types in C++,
as an edit to the N3797 draft of C++17.

3.1 Declarations and definitions [basic.def]

Replace “incomplete” with “indefinite” in [basic.def]p5, so that definitions
of an object can give it sizeless definite type.  Add a further clause
after [basic.def]p5:

* A program is ill-formed if any declaration of an object gives it both
  a sizeless type and either static or thread-local storage duration.

3.9 Types [basic.types]

Replace [basic.types]p5 with:

    A class that has been declared but not defined, an enumeration type
    in certain contexts (7.2), or an array of unknown size or of
    indefinite element type, is an indefinite object type.45)
    Indefinite object types and the void types are indefinite types (3.9.1).
    Objects shall not be defined to have an indefinite type.

and add three additional clauses:

* Object and void types are further partitioned into *sized* and *sizeless*;
  all basic and derived types defined in this standard are sized, but an
  implementation may provide additional sizeless types.

* An object or void type is said to be *complete* if it is both sized and
  definite; all other object and void types are said to be *incomplete*.
  The term *completely-defined object type* is synonymous with *complete
  object type*.

* Arrays, class types and enumeration types are always sized, so for
  them the term *incomplete* is equivalent to (and used interchangeably
  with) the term *indefinite*.

(Note that the wording of footnote 45 continues to apply as-is.)

Also replace “incomplete” with “indefinite” in the forward reference
in [basic.types]p7.

3.9.1 Fundamental Types [basic.fundamental]

In [basic.fundamental]p9, replace the second sentence with:

    The void type is a sized indefinite type that cannot be completed
    (made definite).

leaving the rest of the clause unchanged.

3.9.2. Compound Types [basic.compound]

In this part of [basic.compound]p3:

    Pointers to incomplete types are allowed although there are
    restrictions on what can be done with them …

add “(including indefinite types)” after “incomplete types”.

3.10 Lvalues and rvalues [basic.lval]

Replace “complete” with “definite” and “incomplete” with “indefinite” in
[basic.lval]p4, so that prvalues can have definite type and (in contrast)
glvalues can have indefinite type.

Replace “incomplete” with “indefinite” and “complete” with “definite” in
[basic.lval]p7, so that the target of a pointer can be modifiable if it has
sizeless definite type.

4.1 Lvalue-to-rvalue conversion [conv.lval]

Replace “incomplete” with “indefinite” in [conv.lval]p1, so that sizeless
definite glvalues can be converted to prvalues.

5.2.2 Function call [expr.call]

Replace “completely-defined” with “definite” and “incomplete class type” with
“indefinite type” in [expr.call]p4, so that parameters can have sizeless
definite type.

Replace “incomplete” with “indefinite” and “complete” with “definite” in
[expr.call]p11, so that function call prvalues can have sizeless definite type.

5.2.3 Explicit type conversion (function notation) [expr.type.conv]

Replace “complete” with “definite” in [expr.type.conv]p2, so that ``T()``
can be used for sizeless definite T.

5.3.1 Unary operators [expr.unary.op]

Replace “incomplete” with “indefinite” in [expr.unary.op]p1, so that a
dereferenced pointer to a sizeless definite object can be converted to
a prvalue.

5.3.5 Delete [expr.delete]

After the first sentence in [expr.delete]p2 (which describes converting an
operand with class type to a pointer type), add:

    The type of the operand must now be a pointer to a sized type,
    otherwise the program is ill-formed. Simple type specifiers [dcl.type.simple]

Replace “complete” with “definite” in [dcl.type.simple]p5, so that the special
treatment for decltypes of function calls applies to indefinite rather
than incomplete return types.  This is for consistency with the change
to [expr.call]p11 above.

8.3.4 Arrays [dcl.array]

In [dcl.array]p1, add “a sizeless type” to the list of things that array
element type T cannot be.

9.4.2 Static data members [class.static.data]

Replace “an incomplete type” with “a sized indefinite type” in
[class.static.data]p2, to avoid giving the impression that static data
members can have sizeless type.

Make this explicit by adding the following after [class.static.data]p7:

* A static data member shall not have sizeless type.

14.3.1 Template type parameters [temp.arg.type]

Replace “incomplete” with “indefinite” in [temp.arg.type]p2, which notes that
template type parameters need not be fully defined.

14.7.1 Implicit instantiation [temp.inst]

Replace “completely-defined object type” with “definite object type”
in [temp.inst]p1 and [temp.inst]p6, so that the language edits do not affect
the rules for implicit instantiation. Other functions [res.on.functions]

Replace “incomplete” with “incomplete or indefinite” in [res.on.functions]p2,
so that the library requires the rest of the program to honor the rules
for both categories of type. Type properties [meta.unary.prop]

Replace “complete” with “definite” in [meta.unary.prop]p3 and in the table
that follows.  This specifically includes ``is_destructible``; since sizeless
definite types can have automatic storage duration, it must be possible
to destroy them.  The changes are redundant but harmless for cases in
which the completeness rule applies only to class types.

20.10.6 Relationships between types [meta.rel]

Replace “complete” with “definite” in table 51. Other transformations [meta.trans.other]

Replace “complete” with “definite” in table 57.

Notes for Clang developers

Applying the extension to other cases

The summary and standard edits above describe how the sizeless type
extension interacts with the core parts of the C and C++ standards.
However, Clang supports many other extensions to the core languages,
and will support new versions of the core languages as they evolve
over time.  It is therefore necessary to describe how sizeless types
should interact with future extensions and language developments.

The general principle is that we should continue to keep using the
distinction between incomplete types and complete types unless there is
a specific known benefit to doing otherwise.  Treating sizeless types as
incomplete types should be the conservatively correct choice in almost
all cases.  We can later decide to relax specific rules to use the
distinction between indefinite and definite types once we are sure
that that is the right thing to do.

Note that no decision needs to be made for any rules that are specific
to complete or incomplete aggregates (arrays, structs, unions or classes),
since aggregates are always sized.

Rationale for this extension


The main question that prompted this extension was: how do we add
scalable vector types to the type system?  The key requirements were:

* The approach must work in both C and C++.

* It must be possible to define automatic variables with these types.

* It must be possible to pass and return objects of these types
  (since that is what intrinsics and vector library routines need to do).

* It must be possible to use the types in ``_Generic`` associations
  (since the SVE ACLE uses ``_Generic`` to provide ``tgmath.h``\ -style

* It must be possible to create pointers or references to the types
  (for passing or returning by pointer or reference, and because not
  allowing references would be semantically difficult in C++).

Possible approaches

Any approach to defining scalable types would fall into one of three

(1) Limit the types in such a way that there is no concept of size.

(2) Define the size of the types to be variable.

(3) Define the size of the types to be constant, either with the
    constant being large enough for all possible vector lengths or
    with the types pointing to separate memory (as for C++ classes
    like ``std::string``).

\ (2) seemed initially appealing since C already has the concept of
variable-length arrays.  However, variable-length built-in types
would work in a significantly different way.  Arrays often decay to
pointers (which of course are fixed-length types), whereas vector
types never would.  Unlike arrays, it should be possible to pass
variable-length vectors to functions, return them from functions,
and assign them by value.

One particular difficulty is that the semantics of variable-length arrays
rely on having a point at which the array size is evaluated.  It would
be difficult to extend this approach to built-in types, or to declarations
of functions that return variable-length types.  It would also not be an
accurate model of how an implementation actually behaves, since the
implementation would not evaluate the vector lengths at these points and
would not react to the results of the calculation.

As well as the extension itself being relatively complex (especially
for C++), it might be difficult to define it in a way that interacts
naturally with other extensions.  Also, variable-length arrays were added
to an early draft of C++14, but were later removed as too controversial and
did not make it into the final standard.  C++17 still requires ``sizeof``
to be constant and C11 makes variable-length arrays optional.

\ (2) therefore felt like a complicated dead-end.

\ (3) can be divided into two parts:

a) The vector types have a constant size and are large enough for all
   possible vector lengths.

   The main problem with this approach is that the maximum SVE vector
   length of 2048 bits is much larger than the minimum of 128 bits.  Using
   a fixed size of 2048 bits would be extremely inefficient for smaller
   vector lengths, and of course the whole point of using vectors is to
   make things *more* efficient.

   Also, we would need to define the types such that only the bytes
   associated with the actual vector length are significant.  This would
   make it possible to pass or return the types in registers and treat
   them as register values when copying.  This perhaps has some similarity
   with overaligned structures such as:

   .. code-block:: c

      struct s { _Alignas(16) int i; };

   except that the amount of padding is only known at runtime.

   There is also a significant conceptual problem: encoding a fixed size
   goes against the guiding principle of SVE, in which there is no preferred
   vector length.  There is nothing particularly magical about the current
   limit of 2048 bits and it would be better to avoid an ABI break if the
   maximum ever did increase in future.

b) The vector types have a constant size and refer to separate storage
   (as for C++ classes like ``std::string``).

   This would be difficult to do without C++-style constructor, destructor,
   copy and move semantics, so would not work well in C.  And in C++ it would
   be less efficient than the other approaches, since presumably an allocator
   would be needed to allocate the separate storage.  It would be difficult
   to map this kind of type to a self-contained register-based ABI type.

These are all negative reasons for (1) being the best approach.
A more positive justification is that (1) seems to meet the requirements
in the most efficient way possible.  The vectors can use their natural
(native) representation, and the type system prevents uses that would
make that representation problematic.

Also, the approach of starting with very restricted types and then
specifically allowing certain things should be more future-proof
and interact better with other (unseen) language extensions.  By default,
any language extension would treat the new types like other incomplete
types and choose conservatively-correct behavior.  It would then be
possible to relax the rules if this default behavior turns out to be
too restrictive.

(That said, treating the types as permanently incomplete will
not avoid all clashes with other extensions.  For example, we need to
allow objects of automatic storage duration to have certain forms of
incomplete type, whereas an extension might implicitly assume that all
such objects must already have complete type.  The approach should still
avoid the worst effects though.)

More information about the cfe-dev mailing list