[cfe-dev] RFC: First-class Matrix type

Richard Smith via cfe-dev cfe-dev at lists.llvm.org
Thu Oct 11 15:42:25 PDT 2018


On Wed, 10 Oct 2018 at 23:10, Adam Nemet via cfe-dev <cfe-dev at lists.llvm.org>
wrote:

> Hi,
>
> We are proposing first-class type support for a new matrix type.  This is
> a natural extension of the current vector type with an extra dimension.
> For example, this is what the IR for a matrix multiply would look like for
> a 4x4 matrix with element type float:
>
> %0 = load <4 x 4 x float>, <4 x 4 x float>* %a, align 16
> %1 = load <4 x 4 x float>, <4 x 4 x float>* %b, align 16
> %2 = call <4 x 4 x float> @llvm.matrix.multiply.m4_4f32.m4_4f32.m4_4f32(<4
> x 4 x float> %0, <4 x 4 x float> %1)
> store <4 x 4 x float> %2, <4 x 4 x float>* %c, align 16
>
>
> Currently we support element-wise binary operations, matrix multiply,
> matrix-scalar multiply, matrix transpose, extract/insert of an element.
> Besides the regular full-matrix load and store, we also support loading and
> storing a matrix as a submatrix of a larger matrix in memory.  We are also
> planning to implement vector-extract/insert and matrix-vector multiply.
>
> All of these are currently implemented as intrinsics.  Where applicable we
> also plan to support these operations with native IR instructions (e.g.
> add/fadd).
>
> These are exposed in clang via builtins.  E.g. the above operations looks
> like this in C/C++:
>
> typedef float mf4x4_t __attribute__((matrix_type(4, 4)));
>
> mf4x4_t add(mf4x4_t a, mf4x4_t b) {
>   return __builtin_matrix_multiply(a, b);
> }
>
>
This seems to me to be proposing two distinct things -- built-in matrix
support in the frontend as a language extension, and support for matrices
in LLVM IR -- and I think it makes sense to discuss them at least somewhat
separately.


On the Clang side: our general policy for frontend language extensions is
described here: http://clang.llvm.org/get_involved.html

I'm totally happy to assume that you can cover most of those points, but
point 4 seems likely to be a potential sticking point. Have you talked to
WG14 about adding a matrix extension to C? (I'd note that they don't even
have a vector language extension yet, but as noted on that page, we should
be driving the relevant standards, not diverging from them, so perhaps we
should be pushing for that too). Have you talked to the people working on
adding vector types to C++ about this (in particular, Matthias Kretz and
Tim Shen; see
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0214r9.pdf and
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4454.pdf for
current / prior work in this area)?


On the LLVM IR side, I'm personally unconvinced that we should model
matrices in the IR directly as a new first-class type, unless there's some
target out there that has matrix operations in hardware / matrix registers,
but IR is not really my area of expertise so give that opinion as much or
little weight as you see fit. However, I do wonder: how is this different
from, say, complex numbers, for which we don't have a native IR
representation? (Maybe the answer is that we should have native IR support
for complex numbers too.) How would you expect the frontend to lower (eg)
matrix multiplication for matrices of complex numbers?

** Benefits **
>
> Having matrices represented as IR values allows for the usual algebraic
> and redundancy optimizations.  But most importantly, by lifting memory
> aliasing concerns, we can guarantee vectorization to target-specific
> vectors.  Having a matrix-multiply intrinsic also allows using FMA
> regardless of the optimization level which is the usual sticking point with
> adopting FP-contraction.
>
> Adding a new dedicated first-class type has several advantages over
> mapping them directly to existing IR types like vectors in the front end.
> Matrices have the unique requirement that both rows and columns need to be
> accessed efficiently.  By retaining the original type, we can analyze
> entire chains of operations (after inlining) and determine the most
> efficient *intermediate layout* for the matrix values between ABI
> observable points (calls, memory operations).
>
> The resulting intermediate layout could be something like a single vector
> spanning the entire matrix or a set of vectors and scalars representing
> individual rows/columns.  This is beneficial for example because
> rows/columns would be aligned to the HW vector boundary (e.g. for a 3x3
> matrix).
>
> The layout could also be made up of tiles/submatrices of the matrix.  This
> is an important case for us to fight register pressure.  Rather than
> loading entire matrices into registers it lets us load only parts of the
> input matrix at a time in order to compute some part of the output matrix.
> Since this transformation reorders memory operations, we may also need to
> emit run-time alias checks.
>
> Having a dedicated first-class type also allows for dedicated
> target-specific *ABIs* for matrixes.  This is a pretty rich area for
> matrices.  It includes whether the matrix is stored row-major or
> column-major order.  Whether there is padding between rows/columns.  When
> and how matrices are passed in argument registers.  Providing flexibility
> on the ABI side was critical for the adoption of the new type at Apple.
>
> Having all this knowledge at the IR level means that *front-ends* are
> able to share the complexities of the implementation.  They just map their
> matrix type to the IR type and the builtins to intrinsics.
>
> At Apple, we also need to support *interoperability* between row-major
> and column-major layout.  Since conversion between the two layouts is
> costly, they should be separate types requiring explicit instructions to
> convert between them.  Extending the new type to include the order makes
> tracking the format easy and allows finding optimal conversion points.
>
> ** ABI **
>
> We currently default to column-major order with no padding between the
> columns in memory.  We have plans to also support row-major order and we
> would probably have to support padding at some point for targets where
> unaligned accesses are slow.  In order to make the IR self-contained I am
> planning to make the defaults explicit in the DataLayout string.
>
> For function parameters and return values, matrices are currently placed
> in memory.  Moving forward, we should pass small matrices in vector
> registers.  Treating matrices as structures of vectors seems a natural
> default.   This works well for AArch64, since Homogenous Short-Vector
> Aggregates (HVA) can use all 8 SIMD argument registers.  Thus we could pass
> for example two 4 x 4 x float matrices in registers.  However on X86, we
> can only pass “four eightbytes”, thus limiting us to two 2 x 2 x float
> matrices.
>
> Alternatively, we could treat a matrix as if its rows/columns were passed
> as separate vector arguments.  This would allow using all 8 vector argument
> registers on X86 too.
>
> Alignment of the matrix type is the same as the alignment of its first
> row/column vector.
>
> ** Flow **
>
> Clang at this point mostly just forwards everything to LLVM.  Then in
> LLVM, we have an IR function pass that lowers the matrices to
> target-supported vectors.  As with vectors, matrices can be of any static
> size with any of the primitive types as the element type.
>
> After the lowering pass, we only have matrix function arguments and
> instructions building up and splitting matrix values from and to vectors.
> CodeGen then lowers the arguments and forwards the vector values.  CodeGen
> is already capable of further lowering vectors of any size to scalars if
> the target does not support vectors.
>
> The lowering pass is also run at -O0 rather than legitimizing the matrix
> type during CodeGen like it’s done for structure values or invalid
> vectors.  I don’t really see a big value of duplicating this logic across
> the IR and CodeGen.  We just need a lighter mode in the pass at -O0.
>
> ** Roll-out and Maintenance **
>
> Since this will be experimental for some time, I am planning to put this
> behind a flag: -fenable-experimental-matrix-type.  ABI and intrinsic
> compatibility won’t be guaranteed initially until we lift the experimental
> status.
>
> We are obviously interested in maintaining and improving this code in the
> future.
>
> Looking forward to comments and suggestions.
>
> Thanks,
> Adam
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20181011/27ea356e/attachment.html>


More information about the cfe-dev mailing list