<div dir="ltr"><div dir="ltr">On Tue, 25 Feb 2020 at 13:02, Florian Hahn via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;">Hi,<br><br>We updated the proposal to use operators for most operations (with a lot of help from Michael Spencer!).I’ve added the updated section inline and the complete updated proposal can be found at the end of the mail.<br></div></blockquote><div><br></div><div>Thanks. I'm concerned about the inconsistency between * and the / and % operators, but other than that I think this proposal looks good.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div><blockquote type="cite"><div>On Jan 18, 2020, at 12:01, Florian Hahn via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>> wrote:</div><br><div><div dir="auto" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><div dir="ltr"><div dir="ltr"><span>Thanks for the feedback! I’ve responded inline.<br></span></div><div dir="ltr"><br><blockquote type="cite">On 16 Jan 2020, at 23:06, Richard Smith <<a href="mailto:richard@metafoo.co.uk" target="_blank">richard@metafoo.co.uk</a>> wrote:<br><br></blockquote></div><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div class="gmail_quote"><div>Do you anticipate providing builtin operators for matrices? If not, then the utility of a dedicated type and `matrix_type` attribute seems greatly diminished: the builtin matrix operators could instead -- in principle -- operate on a suitable vector type (either as a flat vector, matching the LLVM IR model, or as a vector of vectors, to support two-dimensional indexing). I think your proposal should express why those would be inferior choices (eg, do matrix types have different calling conventions, alignment requirements, ... on some target? Do you intend to provide matrix x matrix multiplication and matrix x vector multiplication via the * operator in the future?). Adding *only* builtin functions and no new matrix types would be a substantial simplification in the proposal.</div></div></div></div></blockquote><div dir="ltr"><br></div>I think it would make sense to provide builtin operators instead of the proposed builtins for math operations. Same for element insertion/extraction. However I am not sure how to provide the strided matrix load/store as operators. Would it be OK to just have builtins for those? The reason we went for builtins initially was that we thought that might make the proposal a bit more lightweight, but it sounds like builtin operators would be preferred with the type.<br><br>I do not think ext_vector_type would be suitable for our proposal, as it matches LLVM’s vector alignment and the matrix type should match the alignment of the underlying data type, to allow easy interaction with existing matrix libraries. </div><div dir="ltr"><br></div><div dir="ltr">A vector of vectors should work in principle, as long as we could fix both dimensions on a type level. Not having the dimensions guaranteed by the type would have a negative impact on the user-experience I think, as we would, for example, loose the ability to type-check if the dimensions match the operators and users would have to provide the dimensions for certain operations. Also, it would make supporting 3+ dimensions a bit more tricky.</div></div></div></blockquote><div><br></div><div><br></div><div><div>Below is a proposal for matrix type element access & binary operators:</div><div><br></div><div><b>Matrix Type Element Access Operator<br></b><br>An expression of the form `<i>postfix-expression [expression][expression]</i>` where the <i>postfix-expression</i> is of matrix type is a matrix element access expression. <i>expression</i> shall not be a comma expression, and shall be a prvalue of unscoped enumeration or integral type. Given the expression E1[E2][E3], the result is an lvalue of the same type as the underlying element type of the matrix that refers to the value at E2 row and E3 column in the matrix. The expression E1 is sequenced before E2 and E3. The expressions E2 and E3 are unsequenced.<br><br>Note: We thought about providing an expression of the form `<i>postfix-expression [expression]` </i>to access columns of a matrix. We think that such an expression would be problematic once both column and row major matrixes are supported: depending on the memory layout, either accessing columns or rows can be done efficiently, but not both. Instead, we propose to provide builtins to extract rows and columns from a matrix. This makes the operations more explicit.<br></div></div></div></div></blockquote><div><br></div><div>Are you aware of the OpenMP array sections extension? (<a href="https://www.openmp.org/spec-html/5.0/openmpsu21.html">https://www.openmp.org/spec-html/5.0/openmpsu21.html</a>) Inspired by that, you could use something like E1[:][E2] and E1[E2][:] to form column and row vectors. Just a thought; I think using builtins for these is also fine.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div><div><div><b>Matrix Type Binary Operators<br></b><br>Each matrix type supports the following binary operators: +, -, *, /, and %. The * operator provides matrix multiplication, while +, -, /, and % are performed element-wise. There are also scalar versions of the operators, which take a matrix type and the underlying element type. The operation is applied to all elements of the matrix using the scalar value.<br></div></div></div></div></blockquote><div><br></div><div>Supporting elementwise matrix1 / matrix2 seems likely to be confusing if * means matrix multiplication. Is this an important operation? (I'd imagine not, if you're willing to give up elementwise *.)</div><div><br></div><div>I think the following would all be OK:</div><div><br></div><div>1) +, -, and * all follow normal math conventions (* is matrix multiplication or matrix/vector or matrix/scalar multiplication, + and - are elementwise and require the same type on both sides); / and % are not provided</div><div>2) +, -, *, /, and % are elementwise; matrix multiplication is done a different way.</div><div>3) + and - are elementwise; *, /, and % are not provided and those operations are provided a different way</div><div><br></div><div>Of those, I think (1) is the cleanest approach, even though it's inconsistent with how we treat * for two operands of vector type. (2) would be consistent with our approach for vectors, and might be a nice approach if we can find a good alternative syntax for matrix multiplication (infix x is likely unambiguous, but would certainly be novel).</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div><div><div>The operands of +, -, *, and / shall have either matrix type, arithmetic or unscoped enumeration type. The operands of % shall have either matrix type with an element type of integral type, integral type or unscoped enumeration type. At least one of the operands shall be of matrix type.<br><br>For <i>BIN_OP</i> in +, -, *, /, and %, given the expression <i>M1 BIN_OP M2 </i>where, for *, one of M1 or M2 is of arithmetic type:<br><br></div><div>* The usual arithmetic conversions are applied to M1 and M2. [ Note: if M1 or M2 are of arithmetic type, they are broadcast to matrices here. — end note ]<br><br>* The matrix types of M1 and M2 shall have the same number of rows and columns.<br><br>* The result is equivalent to Res in the following where col is the number of columns and row is the number of rows in the matrix type:<br><br>decltype(M1) Res;<br>for (int C = 0; C < col; ++C)<br> for (int R = 0; R < row; ++R)<br> Res[R][C] = M1[R][C] BIN_OP M2[R][C];<br><br><br>Given the expression <i>M1 * M2</i> where M1 and M2 are of matrix type:<br><br> * The usual arithmetic conversions are applied to M1 and M2<br><br> * The type of M1 shall have the same number of columns as the type of M2 has rows.<br><br> * The resulting type, MTy, is the result of applying the usual arithmetic conversions to M1 and M2, but with the same number of rows as M1’s matrix type and the same number of columns as M2’s matrix type.<br><br> * The result is equivalent to Res in the following where col is the number of columns and row is the number of rows in MTy:</div><div><br>MTy Res;<br>for (int C = 0; C < col; ++C) {<br> for (int R = 0; R < row; ++R) {<br> EltTy Elt = 0;<br></div></div></div></div></blockquote><div><br></div><div>Is EltTy the matrix element type or the promoted matrix element type? (Based on what you wrote below, I assume the former, just checking...)</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div><div><div> for (int K = 0; K < inner; ++K) {<br> Elt += M1[R][K] * M2[K][C];<br> }<br> Res[R][C] = Elt;<br>}<br><br><br>All operations on matrix types match the behavior of the underlying element type with respect to signed overflows. </div></div></div></div></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div><div><div>With respect to rounding errors, the the * operator preserves the behavior of the separate multiply and add operations by default. We propose to provide a Clang option to override this behavior and allow contraction of those operations (e.g. -ffp-contract=matrix).<br></div></div></div></div></blockquote><div><br></div><div>OK, so in particular you guarantee to perform the additions in the order given by the above rewrite (assuming no -ffp-contract option)? (So, for example, you get the precision and rounding behavior of doing the adds in that order, and for a signed integer type, you get signed overflow if and only if the intermediate Elt value overflows even if the final Elt value for any (R,C) is back in range.) Seems OK, if that's consistent with your goals.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;"><div><div><div></div></div><blockquote type="cite"><div><div dir="auto" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none"><div dir="ltr"><blockquote type="cite"><div dir="ltr"><div dir="ltr"><div class="gmail_quote"><div>Have the Eigen developers indicated they would consider using this if it were available to them?</div><div>Have you reached out to the GCC developers to see if they would also be likely to support this extension?</div><div>We should be aiming to build critical mass behind this feature so that it gets adopted; it would be a waste of resources if a different technology ends up being adopted in this space and we're left maintaining a system that no-one outside Apple uses.</div></div></div></div></blockquote><div dir="ltr"><br></div>We hoped to get some initial feedback before reaching out, to make sure the proposal is in reasonably good shape. I plan to reach out to them early next week.</div></div></div></blockquote></div><div><br></div>I reached out on the Eigen mailing list and got an encouraging response: <a href="https://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2020/02/msg00000.html" target="_blank">https://listengine.tuxfamily.org/lists.tuxfamily.org/eigen/2020/02/msg00000.html</a> <div>I’ve also added people involved in GCC to the WIP patch adding the type.<br><br></div><div><br></div><div>Cheers,</div><div>Florian</div><div><br></div><div><br></div><div><br></div><div><font size="2"><b>Draft Specification<br></b></font><br><b>Matrix Type Attribute<br></b><br>The attribute-token <i>matrix_type</i> is used to declare a matrix type. It shall appear at most once in each attribute-list. The attribute shall only appertain to a<i> typedef-name</i> of a typedef of a non-volatile type that is a<i> signed integer type</i>, an <i>unsigned integer type</i>, or a <i>floating-point type</i>. An <i>attribute-argument-clause</i> must be present and it shall have the form:<br><br>(constant-expression, constant-expression)<br><br>Both constant-expressions shall be a positive non-zero integral constant expressions. The maximum of the product of the constants is implementation defined. If that implementation defined limit is exceeded, the program is ill-formed.<br><br>An <i>attribute</i> of the form<i> matrix_type(R, C)</i> forms a matrix type with an element type of the cv-qualified type the attribute appertains to and <i>R</i> rows and <i>C</i> columns.<br><br>If a declaration of a <i>typedef-name</i> has a <i>matrix_type</i> attribute, then all declaration of that <i>typedef-name</i> shall have a <i>matrix_type</i> attribute with the same element type, number of rows, and number of columns.<br><br><br><b>Matrix Type<br></b><br>A matrix type has an underlying <i>element type</i>, a constant number of rows, and a constant number of columns. Matrix types with the same element type, rows, and columns are the same type. A value of a matrix type contains <i>rows * columns</i> values of the element type laid out in column-major order without padding in a way compatible with an array of at least that many elements of the underlying <i>element type</i>.<br><br>A matrix type is a scalar type with the same alignment as its underlying element type, but objects of matrix type are not usable in constant expressions.<br><br>TODO: Allow reinterpret_cast from pointer to element type. Make aliasing work.<br>TODO: Does it make sense to allow M::element_type, M::rows, and M::columns where M is a matrix type? We don’t support this anywhere else, but it’s convenient. The alternative is using template deduction to extract this information.<br>Future Work: Initialization syntax.<br>Future Work: Conversions between matrix types with const qualified and unqualified element types.<br><br><br><b>Standard Conversions<br></b><br>The standard conversions are extended as follows.<br><br>For integral promotions, floating-point promotion, integral conversions, floating-point conversions, and floating-integral conversions: apply the rules to the underlying type of the matrix type. The resulting type is a matrix type with that underlying element type. The resulting value is as follows:<div><br></div><div> * If the original value was of matrix type, each element is converted element by element.<br> * If the original value was not of matrix type, each element takes the value of the original value.<br><br><br><b>Arithmetic Conversions<br></b><br>The usual arithmetic conversions are extended as follows.<br><br>Insert at the start:<br><br> * If either operand is of matrix type, apply the usual arithmetic conversions using its underlying element type. The resulting type is a matrix type with that underlying element type.<br><br><br><b>Matrix Type Element Access Operator<br></b><br>An expression of the form `<i>postfix-expression [expression][expression]</i>` where the <i>postfix-expression</i> is of matrix type is a matrix element access expression. <i>expression</i> shall not be a comma expression, and shall be a prvalue of unscoped enumeration or integral type. Given the expression E1[E2][E3], the result is an lvalue of the same type as the underlying element type of the matrix that refers to the value at E2 row and E3 column in the matrix. The expression E1 is sequenced before E2 and E3. The expressions E2 and E3 are unsequenced.<br><br>Note: We thought about providing an expression of the form `<i>postfix-expression [expression]` </i>to access columns of a matrix. We think that such an expression would be problematic once both column and row major matrixes are supported: depending on the memory layout, either accessing columns or rows can be done efficiently, but not both. Instead, we propose to provide builtins to extract rows and columns from a matrix. This makes the operations more explicit.<br><br><br><b>Matrix Type Binary Operators<br></b><br>Each matrix type supports the following binary operators: +, -, *, /, and %. The * operator provides matrix multiplication, while +, -, /, and % are performed element-wise. There are also scalar versions of the operators, which take a matrix type and the underlying element type. The operation is applied to all elements of the matrix using the scalar value.<br><br>The operands of +, -, *, and / shall have either matrix type, arithmetic or unscoped enumeration type. The operands of % shall have either matrix type with an element type of integral type, integral type or unscoped enumeration type. At least one of the operands shall be of matrix type.<br><br>For <i>BIN_OP</i> in +, -, *, /, and %, given the expression <i>M1 BIN_OP M2 </i>where, for *, one of M1 or M2 is of arithmetic type:<br><br></div><div>* The usual arithmetic conversions are applied to M1 and M2. [ Note: if M1 or M2 are of arithmetic type, they are broadcast to matrices here. — end note ]<br><br>* The matrix types of M1 and M2 shall have the same number of rows and columns.<br><br>* The result is equivalent to Res in the following where col is the number of columns and row is the number of rows in the matrix type:<br><br>decltype(M1) Res;<br>for (int C = 0; C < col; ++C)<br> for (int R = 0; R < row; ++R)<br> Res[R][C] = M1[R][C] BIN_OP M2[R][C];<br><br><br>Given the expression <i>M1 * M2</i> where M1 and M2 are of matrix type:<br><br> * The usual arithmetic conversions are applied to M1 and M2<br><br> * The type of M1 shall have the same number of columns as the type of M2 has rows.<br><br> * The resulting type, MTy, is the result of applying the usual arithmetic conversions to M1 and M2, but with the same number of rows as M1’s matrix type and the same number of columns as M2’s matrix type.<br><br> * The result is equivalent to Res in the following where col is the number of columns and row is the number of rows in MTy:</div><div><br>MTy Res;<br>for (int C = 0; C < col; ++C) {<br> for (int R = 0; R < row; ++R) {<br> EltTy Elt = 0;<br> for (int K = 0; K < inner; ++K) {<br> Elt += M1[R][K] * M2[K][C];<br> }<br> Res[R][C] = Elt;<br>}<br><br><br>All operations on matrix types match the behavior of the underlying element type with respect to signed overflows.<br><br>With respect to rounding errors, the the * operator preserves the behavior of the separate multiply and add operations by default. We propose to provide a Clang option to override this behavior and allow contraction of those operations (e.g. -ffp-contract=matrix).<br><br><br><b>Matrix Type Builtin Operations<br></b><br>Each matrix type supports a collection of builtin expressions that look like function calls but do not form an overload set. Here they are described as function declarations with rules for how to construct the argument list types and return type and the library description elements from [library.description.structure.specifications]/3 in the C++ standard. <br><br>Definitions:<br><br><i> * M, M1, M2, M3</i> - Matrix types<br><br> * <i>T</i> - Element type<br><br> *<i> row, col</i> - Row and column arguments respectively.<br><br><br><br><i>M2 __builtin_matrix_transpose(M1 matrix)<br></i><br>Remarks: The return type is a cv-unqualified matrix type that has the same element type as M1 and has the the same number of rows as M1 has columns and the same number of columns as M1 has rows.<br><br>Returns: A matrix Res equivalent to the code below, where col refers to the number of columns of M, and row to the number of rows of M.<br><br>M Res;<br>for (int C = 0; C < col; ++C)<br> for (int R = 0; R < row; ++R)<br> Res[C][R] = matrix[R][C];<br><br><br><br><i>M __builtin_matrix_column_load(T *ptr, int row, int col, int stride)<br></i><br>Mandates: row and col shall be integral constants greater than 0. <br><br>Preconditions: stride >= row.<br><br>Remarks: The return type is a cv-unqualified matrix type with an element type of the cv-unqualified version of T and a number of rows and columns equal to row and col respectively.<br><br>Returns: A matrix Res equivalent to:<br><br>M Res;<br>for (int C = 0; C < col; ++C) {<br> for (int R = 0; R < row; ++K)<br> Res[R][C] = ptr[R];<br> ptr += stride<br><br>}<br><br><br><br><i>void __builtin_matrix_column_store(M matrix, T *ptr, int stride)<br></i><br>Preconditions: stride is greater than or equal to the number of rows in M.<br><br><div>Effects: Equivalent to:</div><div><br>for (int C = 0; C < columns in M; ++C) {<br> for (int R = 0; R < rows in M; ++K)<br> ptr[R] = matrix[R][C];<br> ptr += stride<br>}</div><div><br>Remarks: The type T is the const-unqualified version of the matrix argument’s element type.<br><br><br><b>Example</b> <br><br>This code performs a matrix-multiply of two 4x4 matrices followed by an matrix addition:<br><br>typedef float m4x4_t __attribute__((matrix_type(4, 4)));<br><br>void f(m4x4_t *a, m4x4_t *b, m4x4_t *c, m4x4_t *r) {<br> *r = *a + (*b * *c);<br>}<div><br><br>This will get lowered by Clang to the LLVM IR below. In our current implementation, we use LLVM’s array type as storage type for the matrix data. Before accessing the data, we cast the array to a vector type. This allows us to use the element width as alignment, without running into issues with LLVM’s large default alignment for vector types, which is problematic in structs.<br><br>define void @f([16 x float]* %a, [16 x float]* %b, [16 x float]* %c, [16 x float]* %r) #0 {<br>entry:<br> %a.addr = alloca [16 x float]*, align 8<br> %b.addr = alloca [16 x float]*, align 8<br> %c.addr = alloca [16 x float]*, align 8<br> %r.addr = alloca [16 x float]*, align 8<br> store [16 x float]* %a, [16 x float]** %a.addr, align 8<br> store [16 x float]* %b, [16 x float]** %b.addr, align 8<br> store [16 x float]* %c, [16 x float]** %c.addr, align 8<br> store [16 x float]* %r, [16 x float]** %r.addr, align 8<br> %0 = load [16 x float]*, [16 x float]** %a.addr, align 8<br> %1 = bitcast [16 x float]* %0 to <16 x float>*<br> %2 = load <16 x float>, <16 x float>* %1, align 4<br> %3 = load [16 x float]*, [16 x float]** %b.addr, align 8<br> %4 = bitcast [16 x float]* %3 to <16 x float>*<br> %5 = load <16 x float>, <16 x float>* %4, align 4<br> %6 = call <16 x float> @llvm.matrix.multiply.v16f32.v16f32.v16f32(<16 x float> %2, <16 x float> %5, i32 4, i32 4, i32 4)<br> %7 = load [16 x float]*, [16 x float]** %c.addr, align 8<br> %8 = bitcast [16 x float]* %7 to <16 x float>*<br> %9 = load <16 x float>, <16 x float>* %8, align 4<br> %10 = fadd <16 x float> %6, %9<br> %11 = load [16 x float]*, [16 x float]** %r.addr, align 8<br> %12 = bitcast [16 x float]* %11 to <16 x float>*<br> store <16 x float> %10, <16 x float>* %12, align 4<br><br> ret void<br><br>}<br><br>; Function Attrs: nounwind readnone speculatable willreturn<br><br>declare <16 x float> @llvm.matrix.multiply.v16f32.v16f32.v16f32(<16 x float>, <16 x float>, i32 immarg, i32 immarg, i32 immarg)<br><br></div></div></div></div></div>_______________________________________________<br>
cfe-dev mailing list<br>
<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>
</blockquote></div></div>