[cfe-dev] Matrix Support in Clang

Thu Jan 16 15:06:07 PST 2020

On Fri, 20 Dec 2019 at 10:32, Florian Hahn via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Hello,
>
> This is a Clang-focused follow up to the original proposal on llvm-dev (
> http://lists.llvm.org/pipermail/llvm-dev/2019-October/136240.html). On
> the LLVM side, we recently landed the first commit adding matrix intrinsics
> as proposed.
>
> On the Clang side, we would like to propose adding support for matrix math
> operations to Clang. This includes adding a new matrix type (similar to
> ext_vector_type) and a set of builtins to operate on values of the matrix
> type.
>
> Our main motivation for the matrix support in Clang is to give users a way
> to
>
>    - Guarantee generation of high-quality code for matrix operations. For
>    isolated operations, we can guarantee vector code generation suitable for
>    the target. For trees of operations, the proposed value type helps with
>    eliminating temporary loads & stores.
>    - Make use of specialized matrix ISA extensions, like the new matrix
>    instructions in ARM v8.6 or various proprietary matrix accelerators, in
>    their C/C++ code.
>    - Move optimizations from matrix wrapper libraries into the compiler. We
>    use it internally to simplify an Eigen-style matrix library, by relying on
>    LLVM for generating tiled & fused loops for matrix operations.
>
> The rest of this RFC is structured as follows: First we propose a draft
> specification for the matrix type and accompanying builtins. Next we show
> an example of how matrix operations will be lowered by Clang, followed by a
> discussion of the contributing criteria for new extensions.  We wrap up the
> RFC by discussing possible extensions to the matrix type.
> Draft SpecificationMatrix TYPE AttributeThe *attribute-token* matrix_type is
> used to declare a matrix type. It shall appear at most once in each
> *attribute-list*. The attribute shall only appertain to a *typedef-name* of
> a typedef of a non-volatile type that is a *signed integer type*, an *unsigned
> integer type*, or a *floating-point type*. An *attribute-argument-clause* must
> be present and it shall have the form:
>
> (*constant-expression*, *constant-expression*)
>
> Both *constant-expression*s shall be a positive non-zero integral
> constant expressions. The maximum of the product of the constants is
> implementation defined. If that implementation defined limit is exceeded,
> the program is ill-formed.
>
> An *attribute* of the form matrix_type(*R*, *C*) forms a matrix type with
> an element type of the cv-qualified type the attribute appertains to and
> *R* rows and *C* columns.
>
> If a declaration of a *typedef-name* has a *matrix_type* attribute, then
> all declaration of that *typedef-name* shall have a *matrix_type* attribute
> with the same element type, number of rows, and number of columns.
>
> Matrix TypeA matrix type has an underlying *element type*, a constant
> number of rows, and a constant number of columns. Matrix types with the
> same element type, rows, and columns are the same type. A value of a matrix
> type contains rows * columns values of the *element type* laid out in
> column-major order without padding in a way compatible with an array of at
> least that many elements of the underlying *element type*.
>
> A matrix type is a *scalar type *with the same alignment as its
> underlying element type, but objects of matrix type are not usable in
> constant expressions.
>
> TODO: Allow reinterpret_cast from pointer to element type. Make aliasing
> work.
> Future Work: Initialization syntax.
> Future Work: Access syntax. m[col][row].
> Future Work: Conversions between matrix types with const qualified and
> unqualified element types.
> Future Work: Conversions between matrix types with different element types.
>
> Matrix Type builtin OperationsEach matrix type supports a collection of
> builtin expressions that look like function calls but do not form an
> overload set. Here they are described as function declarations with rules
> for how to construct the argument list types and return type and the
> library description elements from
> [library.description.structure.specifications]/3 in the C++ standard.
>
> Definitions:
>
>    - *M, M1, M2, M3* - Matrix types
>    - *T* - Element type
>    - row, col - Row and column arguments respectively.
>
> All operations on matrix types match the behavior of the underlying
> element type with respect to signed overflows.
>

Do you anticipate providing builtin operators for matrices? If not, then
the utility of a dedicated type and `matrix_type` attribute seems greatly
diminished: the builtin matrix operators could instead -- in principle --
operate on a suitable vector type (either as a flat vector, matching the
LLVM IR model, or as a vector of vectors, to support two-dimensional
indexing). I think your proposal should express why those would be inferior
choices (eg, do matrix types have different calling conventions, alignment
requirements, ... on some target? Do you intend to provide matrix x matrix
multiplication and matrix x vector multiplication via the * operator in the
future?). Adding *only* builtin functions and no new matrix types would be
a substantial simplification in the proposal.

*Element Operations*
>
> Preconditions: row and col are in the ranges [0, rows in *M*) and [0,
> columns in *M*) respectively.
>
> *M* __builtin_matrix_insert(*M* matrix, int row, int col, *T* elt)
>
> Remarks: The return type and the type *T* are inferred from the
> cv-unqualified type of the matrix argument and its cv-unqualified *element
> type* respectively.
>
> Returns: a copy of matrix with the element at the specified row and
> column set to elt.
>
>
> *T* __builtin_matrix_extract(*M* matrix, int row, int col)
>
> The return type is inferred from the cv-unqualified type of the matrix
>  argument’s *element type*.
>
> Returns: a copy of the element at the specified row and column.
>
>
> *Simple Binary Operations*
>
> For the following binary operations matrix1 and matrix2 shall be matrix
> values of the same cv-unqualified type, and the return type is the
> cv-unqualified version of that type.
>
> *M* __builtin_matrix_add(*M* matrix1, *M* matrix2)
>
> Returns: A matrix Res equivalent to the code below, where col refers to
> the number of columns of M, row to the number of rows of M and EltTy to
> the element type of M.
>
> *M* Res;
> for (int C = 0; C < col; ++C) {
>   for (int R = 0; R < row; ++R) {
>     EltTy Elt = __builtin_matrix_extract(matrix1, R, C) +
>                      __builtin_matrix_extract(matrix2, R, C)
>     Res = __builtin_matrix_insert(Res, R, C, Elt);
>   }
> }
>
>
>
> *M* __builtin_matrix_sub(*M* matrix1, *M* matrix2)
>
> Returns: A matrix Res equivalent to the code below, where col refers to
> the number of columns of M, row to the number of rows of M and EltTy to
> the element type of M.
>
> *M* Res;
> for (int C = 0; C < col; ++C) {
>   for (int R = 0; R < row; ++R) {
>     EltTy Elt = __builtin_matrix_extract(matrix1, R, C) -
>                      __builtin_matrix_extract(matrix2, R, C)
>     Res = __builtin_matrix_insert(Res, R, C, Elt);
>   }
> }
>
>
>
> *Other Operations*
>
> *M3* __builtin_matrix_multiply(*M1* matrix1, *M2* matrix2)
>
> Mandates: *M1* and *M2* shall be matrix types with the same
> cv-unqualified element type and *M1*’s number of columns matching *M2*’s
> number of row.
>
> Remarks: The return type is a cv-unqualified matrix type with the same
> element type as *M1* and *M2* if both *M1* and *M2*’s element type is
> const, or the cv-unqualified element type otherwise, and with the same
> number of rows as *M1* and the same number of columns as *M2*.
>
> Returns: A matrix Res equivalent to the code below, where col refers to
> the number of columns of M, row to the number of rows of M, EltTy to the
> element type of M and inner refers to the number of columns of M1.
>
> *M* Res;
> for (int C = 0; C < col; ++C) {
>   for (int R = 0; R < row; ++R) {
>     EltTy Elt = 0;
>     for (int K = 0; K < inner; ++K) {
>       Elt += __builtin_matrix_extract(matrix1, R, K) *
>                  __builtin_matrix_extract(matrix2, K, C)
>   }
>   Res = __builtin_matrix_insert(Res, R, C, Elt);
> }
>
> Remark: With respect to rounding errors, the operation preserves the
> behavior of the separate multiply and add operations by default. We propose
> to provide a Clang option to override this behavior and allow contraction
> of those operations (e.g. -ffp-contract=matrix).
>

The above seem like they would be better if provided as operators rather
than as builtin functions. We don't provide builtins for these kinds of
operations for vector types, because we expect all code to use the operator
syntax instead.

*M2* __builtin_matrix_transpose(*M1* matrix)
>
> Remarks: The return type is a cv-unqualified matrix type that has the same
> element type as *M1* and has the the same number of rows as *M1* has
> columns and the same number of columns as *M1* has rows.
>
> Returns: A matrix Res equivalent to the code below, where col refers to
> the number of columns of M, and row to the number of rows of M.
>
> *M* Res;
> for (int C = 0; C < col; ++C) {
>   for (int R = 0; R < row; ++R) {
>     EltTy Elt = __builtin_matrix_extract(matrix, R, C);
>     Res = __builtin_matrix_insert(Res, C, R, Elt);
>   }
> }
>
> Maybe it's a bit cute, but have you considered using an operator such as
prefix ~ for this, or perhaps a posfix .T? (This is in some sense a
swizzle, and we use member-access-like syntax for those already.)

*M* __builtin_matrix_column_load(*T* *ptr, int row, int col, int stride)
>
> Mandates: row and col shall be integral constants greater than 0.
>
> Preconditions: stride >= row.
>
> Remarks: The return type is a cv-unqualified matrix type with an element
> type of the cv-unqualified version of *T* and a number of rows and
> columns equal to row and col respectively.
>
> Returns: A matrix Res equivalent to:
>
> *M* Res;
> for (int C = 0; C < col; ++C) {
>   for (int R = 0; R < row; ++K)
>     Res = __builtin_matrix_insert(Res, R, C, ptr[R]);
>   ptr += stride
> }
>
>
>
> void __builtin_matrix_column_store(*M* matrix, *T* *ptr, int stride)
>
> Preconditions: stride is greater than or equal to the number of rows in
> *M*.
>
> Effects: Equivalent to:
>
> for (int C = 0; C < *columns **in** M*; ++C) {
>   for (int R = 0; R < *rows **in M*; ++K)
>     ptr[R] = __builtin_matrix_extract(matrix, R, C);
>   ptr += stride
> }
>
> Remarks: The type *T* is the const-unqualified version of the matrix
>  argument’s *element type*.
>

Presumably these would be unnecessary if we permitted casting between an M*
and a T* and treating the M* as a suitably-sized array of T? (Again, we
don't have anything like this for vector types, for which we do guarantee
that you can cast a vector* to a T* and access the vector elements
directly.)

*M* __builtin_matrix_scalar_multiply(*M* matrix, *T* scalar)
>
> Returns: A matrix Res equivalent to the code below, where col refers to
> the number of columns of M, and row to the number of rows of M.
>
> *M* Res;
> for (int C = 0; C < col; ++C) {
>   for (int R = 0; R < row; ++R) {
>     EltTy Elt = __builtin_matrix_extract(matrix, R, C) * scalar;
>     Res = __builtin_matrix_insert(Res, R, C, Elt);
>   }
> }
>
> Remarks: The return type and the type *T* are the cv-unqualified type of
> the matrix argument and its cv-unqualified *element type* respectively.
>

(As with the above operators, using the * operator for this seems more
appropriate to me.)

> Example This code performs a matrix-multiply of two 4x4 matrices followed
> by an matrix addition:
>
> typedef float m4x4_t __attribute__((matrix_type(4, 4)));
> void f(m4x4_t *a, m4x4_t *b, m4x4_t *c, m4x4_t *r) {
>   *r = __builtin_matrix_add(__builtin_matrix_multiply(*a, *b), *c);
> }
>
> This will get lowered by Clang to the LLVM IR below. In our current
> implementation, we use LLVM’s array type as storage type for the matrix
> data. Before accessing the data, we cast the array to a vector type. This
> allows us to use the element width as alignment, without running into
> issues with LLVM’s large default alignment for vector types, which is
> problematic in structs.
>
> define void @f([16 x float]* %a, [16 x float]* %b, [16 x float]* %c, [16 x float]* %r) #0 {
> entry:
>   %a.addr = alloca [16 x float]*, align 8
>   %b.addr = alloca [16 x float]*, align 8
>   %c.addr = alloca [16 x float]*, align 8
>   %r.addr = alloca [16 x float]*, align 8
>   store [16 x float]* %a, [16 x float]** %a.addr, align 8
>   store [16 x float]* %b, [16 x float]** %b.addr, align 8
>   store [16 x float]* %c, [16 x float]** %c.addr, align 8
>   store [16 x float]* %r, [16 x float]** %r.addr, align 8
>   %0 = load [16 x float]*, [16 x float]** %a.addr, align 8
>   %1 = bitcast [16 x float]* %0 to <16 x float>*
>   %2 = load <16 x float>, <16 x float>* %1, align 4
>   %3 = load [16 x float]*, [16 x float]** %b.addr, align 8
>   %4 = bitcast [16 x float]* %3 to <16 x float>*
>   %5 = load <16 x float>, <16 x float>* %4, align 4
>   %6 = call <16 x float> @llvm.matrix.multiply.v16f32.v16f32.v16f32(<16 x float> %2, <16 x float> %5, i32 4, i32 4, i32 4)
>   %7 = load [16 x float]*, [16 x float]** %c.addr, align 8
>   %8 = bitcast [16 x float]* %7 to <16 x float>*
>   %9 = load <16 x float>, <16 x float>* %8, align 4
>   %10 = fadd <16 x float> %6, %9
>   %11 = load [16 x float]*, [16 x float]** %r.addr, align 8
>   %12 = bitcast [16 x float]* %11 to <16 x float>*
>   store <16 x float> %10, <16 x float>* %12, align 4
>   ret void
> }
> declare <16 x float> @llvm.matrix.multiply.v16f32.v16f32.v16f32(<16 x float>, <16 x float>, i32 immarg, i32 immarg, i32 immarg)
>
>
> Contributing Criteria
>
> *Evidence of a significant user community: This is based on a number of
> factors, including an existing user community, the perceived likelihood
> that users would adopt such a feature if it were available, and any
> secondary effects that come from, e.g., a library adopting the feature and
> providing benefits to its users.*Currently this is part of one of our
> compiler toolchains and used on a few large internal codebases. The matrix
> type can be used by matrix libraries like Eigen, to offload some of the
> optimization responsibility from the library to the compiler.
>

Have the Eigen developers indicated they would consider using this if it
were available to them?
Have you reached out to the GCC developers to see if they would also be
likely to support this extension?
We should be aiming to build critical mass behind this feature so that it
gets adopted; it would be a waste of resources if a different technology
ends up being adopted in this space and we're left maintaining a system
that no-one outside Apple uses.

It would also be suitable target for implementing a standard matrix
> library. It also provides functionality similar to various libraries for
> matrix math on small matrixes, like
> https://developer.apple.com/documentation/accelerate/working_with_matrices,
> with more flexibility (supports any combination of input dimensions).
>
>
> *A specific need to reside within the Clang tree: There are some
> extensions that would be better expressed as a separate tool, and should
> remain as separate tools even if they end up being hosted as part of the
> LLVM umbrella project.*We want to expose this feature at the C/C++ level.
> For that, it needs to be part of Clang.
>
> *A specification: The specification must be sufficient to understand the
> design of the feature as well as interpret the meaning of specific
> examples. The specification should be detailed enough that another compiler
> vendor could implement the feature.*
> We currently have the design above and will work on a more comprehensive
> spec.
>

Do you anticipate the various psABIs being updated to specify the calling
convention for matrix parameters and return values? If not, you'll need to
include that in your specification too.
Similarly, you will need to specify a mangling to use for these types in
both the Itanium and MS C++ ABIs.

> *Representation within the appropriate governing organization: For
> extensions to a language governed by a standards committee (C, C++,
> OpenCL), the extension itself must have an active proposal and proponent
> within that committee and have a reasonable chance of acceptance. Clang
> should drive the standard, not diverge from it. This criterion does not
> apply to all extensions, since some extensions fall outside of the realm of
> the standards bodies.*We think this extension would fall outside of the
> realm of the standards bodies. It is an implementation detail used to
> implement matrix math libraries and such, much like the vector extensions
> are an implementation detail for SIMD libraries.
>
>
> *A long-term support plan: increasingly large or complex extensions to
> Clang need matching commitments to supporting them over time, including
> improving their implementation and specification as Clang evolves. The
> capacity of the contributor to make that commitment is as important as the
> commitment itself.*We are using this internally and adding this feature
> to Clang upstream means we intend to support it as part of our ongoing
> Clang work.
>
>
> *A high-quality implementation: The implementation must fit well into
> Clang's architecture, follow LLVM's coding conventions, and meet Clang's
> quality standards, including diagnostics and complete AST representations.
> This is particularly important for language extensions, because users will
> learn how those extensions work through the behavior of the compiler.*Will
> we provide a series of patches to implement the extension soon and look
> forward to any feedback to make sure the patches meet the quality
> requirement.
>
> *A test suite: Extensive testing is crucial to ensure that the language
> extension is not broken by ongoing maintenance in Clang. The test suite
> should be complete enough that another compiler vendor could conceivably
> validate their implementation of the feature against it*
> We will provide this as part of Clang’s unit tests and test-suite.
>
>
> ExtensionsInitially we want to focus on 2D matrixes without padding in
> column-major layout as a concrete use case. This is similar to the defaults
> for the Matrix type in Eigen, for example. But our proposed type can be
> extended naturally to
>
>    - Support N (known constant) dimensions by turning matrix_type
>    attribute into a variadic attribute.
>
> Hmm. "matrix" wouldn't really be the right name for the generalized
attribute. Presumably matrix_type(N) would mean the same thing as
ext_vector_type(N)? Are there realistic use cases for this? (I expect it's
not worth planning for this eventuality until we actually have such a use
case.)

>
>    -
>    - Support column/row-wise padding, by adding a column_padding clause
>    to the attribute.
>    Dealing with the padding could be exclusively handled on the frontend
>    side, by emitting additional shufflevector instructions to extract the
>    data. If there is a desire to exploit the padding more on the LLVM side, we
>    can add a set of intrinsics for that.
>    - Support row & column major layouts, by adding a layout clause to the
>    attribute.
>    Again, this naively could be handled while lowering to LLVM IR in
>    Clang using shufflevector to produce flattened vectors with the required
>    layout. For better optimisations, the LLVM intrinsics relying on
>    shape/layout information can be extended to take the layout as additional
>    argument. Through propagating the layout information similar to the
>    dimensions, we should be able to optimise the points where we need to
>    transform the layout of the underlying matrixes.
>
> In all cases, we require known integer constants as dimensions and we do
> not plan to support dynamic dimensions for now, as the main optimization
> potential comes from the fact that we know the dimensions. Supporting
> dynamic dimensions should be fairly straight forward, but means we lose the
> ability to type check matrix expressions at compile time and we also have
> to rely on dynamic dimension during code generation.
>
> Cheers,
>  Florian
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200116/973a22bb/attachment-0001.html>