[cfe-dev] Matrix Support in Clang
Florian Hahn via cfe-dev
cfe-dev at lists.llvm.org
Tue Feb 25 13:02:28 PST 2020
Hi,
We updated the proposal to use operators for most operations (with a lot of help from Michael Spencer!).I’ve added the updated section inline and the complete updated proposal can be found at the end of the mail.
> On Jan 18, 2020, at 12:01, Florian Hahn via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>
> Thanks for the feedback! I’ve responded inline.
>
>> On 16 Jan 2020, at 23:06, Richard Smith <richard at metafoo.co.uk <mailto:richard at metafoo.co.uk>> wrote:
>>
>> Do you anticipate providing builtin operators for matrices? If not, then the utility of a dedicated type and `matrix_type` attribute seems greatly diminished: the builtin matrix operators could instead -- in principle -- operate on a suitable vector type (either as a flat vector, matching the LLVM IR model, or as a vector of vectors, to support two-dimensional indexing). I think your proposal should express why those would be inferior choices (eg, do matrix types have different calling conventions, alignment requirements, ... on some target? Do you intend to provide matrix x matrix multiplication and matrix x vector multiplication via the * operator in the future?). Adding *only* builtin functions and no new matrix types would be a substantial simplification in the proposal.
>
> I think it would make sense to provide builtin operators instead of the proposed builtins for math operations. Same for element insertion/extraction. However I am not sure how to provide the strided matrix load/store as operators. Would it be OK to just have builtins for those? The reason we went for builtins initially was that we thought that might make the proposal a bit more lightweight, but it sounds like builtin operators would be preferred with the type.
>
> I do not think ext_vector_type would be suitable for our proposal, as it matches LLVM’s vector alignment and the matrix type should match the alignment of the underlying data type, to allow easy interaction with existing matrix libraries.
>
> A vector of vectors should work in principle, as long as we could fix both dimensions on a type level. Not having the dimensions guaranteed by the type would have a negative impact on the user-experience I think, as we would, for example, loose the ability to type-check if the dimensions match the operators and users would have to provide the dimensions for certain operations. Also, it would make supporting 3+ dimensions a bit more tricky.
Below is a proposal for matrix type element access & binary operators:
Matrix Type Element Access Operator
An expression of the form `postfix-expression [expression][expression]` where the postfix-expression is of matrix type is a matrix element access expression. expression shall not be a comma expression, and shall be a prvalue of unscoped enumeration or integral type. Given the expression E1[E2][E3], the result is an lvalue of the same type as the underlying element type of the matrix that refers to the value at E2 row and E3 column in the matrix. The expression E1 is sequenced before E2 and E3. The expressions E2 and E3 are unsequenced.
Note: We thought about providing an expression of the form `postfix-expression [expression]` to access columns of a matrix. We think that such an expression would be problematic once both column and row major matrixes are supported: depending on the memory layout, either accessing columns or rows can be done efficiently, but not both. Instead, we propose to provide builtins to extract rows and columns from a matrix. This makes the operations more explicit.
Matrix Type Binary Operators
Each matrix type supports the following binary operators: +, -, *, /, and %. The * operator provides matrix multiplication, while +, -, /, and % are performed element-wise. There are also scalar versions of the operators, which take a matrix type and the underlying element type. The operation is applied to all elements of the matrix using the scalar value.
The operands of +, -, *, and / shall have either matrix type, arithmetic or unscoped enumeration type. The operands of % shall have either matrix type with an element type of integral type, integral type or unscoped enumeration type. At least one of the operands shall be of matrix type.
For BIN_OP in +, -, *, /, and %, given the expression M1 BIN_OP M2 where, for *, one of M1 or M2 is of arithmetic type:
* The usual arithmetic conversions are applied to M1 and M2. [ Note: if M1 or M2 are of arithmetic type, they are broadcast to matrices here. — end note ]
* The matrix types of M1 and M2 shall have the same number of rows and columns.
* The result is equivalent to Res in the following where col is the number of columns and row is the number of rows in the matrix type:
decltype(M1) Res;
for (int C = 0; C < col; ++C)
for (int R = 0; R < row; ++R)
Res[R][C] = M1[R][C] BIN_OP M2[R][C];
Given the expression M1 * M2 where M1 and M2 are of matrix type:
* The usual arithmetic conversions are applied to M1 and M2
* The type of M1 shall have the same number of columns as the type of M2 has rows.
* The resulting type, MTy, is the result of applying the usual arithmetic conversions to M1 and M2, but with the same number of rows as M1’s matrix type and the same number of columns as M2’s matrix type.
* The result is equivalent to Res in the following where col is the number of columns and row is the number of rows in MTy:
MTy Res;
for (int C = 0; C < col; ++C) {
for (int R = 0; R < row; ++R) {
EltTy Elt = 0;
for (int K = 0; K < inner; ++K) {
Elt += M1[R][K] * M2[K][C];
}
Res[R][C] = Elt;
}
All operations on matrix types match the behavior of the underlying element type with respect to signed overflows.
With respect to rounding errors, the the * operator preserves the behavior of the separate multiply and add operations by default. We propose to provide a Clang option to override this behavior and allow contraction of those operations (e.g. -ffp-contract=matrix).
>> Have the Eigen developers indicated they would consider using this if it were available to them?
>> Have you reached out to the GCC developers to see if they would also be likely to support this extension?
>> We should be aiming to build critical mass behind this feature so that it gets adopted; it would be a waste of resources if a different technology ends up being adopted in this space and we're left maintaining a system that no-one outside Apple uses.
>
> We hoped to get some initial feedback before reaching out, to make sure the proposal is in reasonably good shape. I plan to reach out to them early next week.
I reached out on the Eigen mailing list and got an encouraging response: https://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2020/02/msg00000.html <https://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2020/02/msg00000.html>
I’ve also added people involved in GCC to the WIP patch adding the type.
Cheers,
Florian
Draft Specification
Matrix Type Attribute
The attribute-token matrix_type is used to declare a matrix type. It shall appear at most once in each attribute-list. The attribute shall only appertain to a typedef-name of a typedef of a non-volatile type that is a signed integer type, an unsigned integer type, or a floating-point type. An attribute-argument-clause must be present and it shall have the form:
(constant-expression, constant-expression)
Both constant-expressions shall be a positive non-zero integral constant expressions. The maximum of the product of the constants is implementation defined. If that implementation defined limit is exceeded, the program is ill-formed.
An attribute of the form matrix_type(R, C) forms a matrix type with an element type of the cv-qualified type the attribute appertains to and R rows and C columns.
If a declaration of a typedef-name has a matrix_type attribute, then all declaration of that typedef-name shall have a matrix_type attribute with the same element type, number of rows, and number of columns.
Matrix Type
A matrix type has an underlying element type, a constant number of rows, and a constant number of columns. Matrix types with the same element type, rows, and columns are the same type. A value of a matrix type contains rows * columns values of the element type laid out in column-major order without padding in a way compatible with an array of at least that many elements of the underlying element type.
A matrix type is a scalar type with the same alignment as its underlying element type, but objects of matrix type are not usable in constant expressions.
TODO: Allow reinterpret_cast from pointer to element type. Make aliasing work.
TODO: Does it make sense to allow M::element_type, M::rows, and M::columns where M is a matrix type? We don’t support this anywhere else, but it’s convenient. The alternative is using template deduction to extract this information.
Future Work: Initialization syntax.
Future Work: Conversions between matrix types with const qualified and unqualified element types.
Standard Conversions
The standard conversions are extended as follows.
For integral promotions, floating-point promotion, integral conversions, floating-point conversions, and floating-integral conversions: apply the rules to the underlying type of the matrix type. The resulting type is a matrix type with that underlying element type. The resulting value is as follows:
* If the original value was of matrix type, each element is converted element by element.
* If the original value was not of matrix type, each element takes the value of the original value.
Arithmetic Conversions
The usual arithmetic conversions are extended as follows.
Insert at the start:
* If either operand is of matrix type, apply the usual arithmetic conversions using its underlying element type. The resulting type is a matrix type with that underlying element type.
Matrix Type Element Access Operator
An expression of the form `postfix-expression [expression][expression]` where the postfix-expression is of matrix type is a matrix element access expression. expression shall not be a comma expression, and shall be a prvalue of unscoped enumeration or integral type. Given the expression E1[E2][E3], the result is an lvalue of the same type as the underlying element type of the matrix that refers to the value at E2 row and E3 column in the matrix. The expression E1 is sequenced before E2 and E3. The expressions E2 and E3 are unsequenced.
Note: We thought about providing an expression of the form `postfix-expression [expression]` to access columns of a matrix. We think that such an expression would be problematic once both column and row major matrixes are supported: depending on the memory layout, either accessing columns or rows can be done efficiently, but not both. Instead, we propose to provide builtins to extract rows and columns from a matrix. This makes the operations more explicit.
Matrix Type Binary Operators
Each matrix type supports the following binary operators: +, -, *, /, and %. The * operator provides matrix multiplication, while +, -, /, and % are performed element-wise. There are also scalar versions of the operators, which take a matrix type and the underlying element type. The operation is applied to all elements of the matrix using the scalar value.
The operands of +, -, *, and / shall have either matrix type, arithmetic or unscoped enumeration type. The operands of % shall have either matrix type with an element type of integral type, integral type or unscoped enumeration type. At least one of the operands shall be of matrix type.
For BIN_OP in +, -, *, /, and %, given the expression M1 BIN_OP M2 where, for *, one of M1 or M2 is of arithmetic type:
* The usual arithmetic conversions are applied to M1 and M2. [ Note: if M1 or M2 are of arithmetic type, they are broadcast to matrices here. — end note ]
* The matrix types of M1 and M2 shall have the same number of rows and columns.
* The result is equivalent to Res in the following where col is the number of columns and row is the number of rows in the matrix type:
decltype(M1) Res;
for (int C = 0; C < col; ++C)
for (int R = 0; R < row; ++R)
Res[R][C] = M1[R][C] BIN_OP M2[R][C];
Given the expression M1 * M2 where M1 and M2 are of matrix type:
* The usual arithmetic conversions are applied to M1 and M2
* The type of M1 shall have the same number of columns as the type of M2 has rows.
* The resulting type, MTy, is the result of applying the usual arithmetic conversions to M1 and M2, but with the same number of rows as M1’s matrix type and the same number of columns as M2’s matrix type.
* The result is equivalent to Res in the following where col is the number of columns and row is the number of rows in MTy:
MTy Res;
for (int C = 0; C < col; ++C) {
for (int R = 0; R < row; ++R) {
EltTy Elt = 0;
for (int K = 0; K < inner; ++K) {
Elt += M1[R][K] * M2[K][C];
}
Res[R][C] = Elt;
}
All operations on matrix types match the behavior of the underlying element type with respect to signed overflows.
With respect to rounding errors, the the * operator preserves the behavior of the separate multiply and add operations by default. We propose to provide a Clang option to override this behavior and allow contraction of those operations (e.g. -ffp-contract=matrix).
Matrix Type Builtin Operations
Each matrix type supports a collection of builtin expressions that look like function calls but do not form an overload set. Here they are described as function declarations with rules for how to construct the argument list types and return type and the library description elements from [library.description.structure.specifications]/3 in the C++ standard.
Definitions:
* M, M1, M2, M3 - Matrix types
* T - Element type
* row, col - Row and column arguments respectively.
M2 __builtin_matrix_transpose(M1 matrix)
Remarks: The return type is a cv-unqualified matrix type that has the same element type as M1 and has the the same number of rows as M1 has columns and the same number of columns as M1 has rows.
Returns: A matrix Res equivalent to the code below, where col refers to the number of columns of M, and row to the number of rows of M.
M Res;
for (int C = 0; C < col; ++C)
for (int R = 0; R < row; ++R)
Res[C][R] = matrix[R][C];
M __builtin_matrix_column_load(T *ptr, int row, int col, int stride)
Mandates: row and col shall be integral constants greater than 0.
Preconditions: stride >= row.
Remarks: The return type is a cv-unqualified matrix type with an element type of the cv-unqualified version of T and a number of rows and columns equal to row and col respectively.
Returns: A matrix Res equivalent to:
M Res;
for (int C = 0; C < col; ++C) {
for (int R = 0; R < row; ++K)
Res[R][C] = ptr[R];
ptr += stride
}
void __builtin_matrix_column_store(M matrix, T *ptr, int stride)
Preconditions: stride is greater than or equal to the number of rows in M.
Effects: Equivalent to:
for (int C = 0; C < columns in M; ++C) {
for (int R = 0; R < rows in M; ++K)
ptr[R] = matrix[R][C];
ptr += stride
}
Remarks: The type T is the const-unqualified version of the matrix argument’s element type.
Example
This code performs a matrix-multiply of two 4x4 matrices followed by an matrix addition:
typedef float m4x4_t __attribute__((matrix_type(4, 4)));
void f(m4x4_t *a, m4x4_t *b, m4x4_t *c, m4x4_t *r) {
*r = *a + (*b * *c);
}
This will get lowered by Clang to the LLVM IR below. In our current implementation, we use LLVM’s array type as storage type for the matrix data. Before accessing the data, we cast the array to a vector type. This allows us to use the element width as alignment, without running into issues with LLVM’s large default alignment for vector types, which is problematic in structs.
define void @f([16 x float]* %a, [16 x float]* %b, [16 x float]* %c, [16 x float]* %r) #0 {
entry:
%a.addr = alloca [16 x float]*, align 8
%b.addr = alloca [16 x float]*, align 8
%c.addr = alloca [16 x float]*, align 8
%r.addr = alloca [16 x float]*, align 8
store [16 x float]* %a, [16 x float]** %a.addr, align 8
store [16 x float]* %b, [16 x float]** %b.addr, align 8
store [16 x float]* %c, [16 x float]** %c.addr, align 8
store [16 x float]* %r, [16 x float]** %r.addr, align 8
%0 = load [16 x float]*, [16 x float]** %a.addr, align 8
%1 = bitcast [16 x float]* %0 to <16 x float>*
%2 = load <16 x float>, <16 x float>* %1, align 4
%3 = load [16 x float]*, [16 x float]** %b.addr, align 8
%4 = bitcast [16 x float]* %3 to <16 x float>*
%5 = load <16 x float>, <16 x float>* %4, align 4
%6 = call <16 x float> @llvm.matrix.multiply.v16f32.v16f32.v16f32(<16 x float> %2, <16 x float> %5, i32 4, i32 4, i32 4)
%7 = load [16 x float]*, [16 x float]** %c.addr, align 8
%8 = bitcast [16 x float]* %7 to <16 x float>*
%9 = load <16 x float>, <16 x float>* %8, align 4
%10 = fadd <16 x float> %6, %9
%11 = load [16 x float]*, [16 x float]** %r.addr, align 8
%12 = bitcast [16 x float]* %11 to <16 x float>*
store <16 x float> %10, <16 x float>* %12, align 4
ret void
}
; Function Attrs: nounwind readnone speculatable willreturn
declare <16 x float> @llvm.matrix.multiply.v16f32.v16f32.v16f32(<16 x float>, <16 x float>, i32 immarg, i32 immarg, i32 immarg)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200225/8c5ba1c5/attachment-0001.html>
More information about the cfe-dev
mailing list