[llvm-dev] [cfe-dev] RFC: Matrix math support
Chris Lattner via llvm-dev
llvm-dev at lists.llvm.org
Mon Oct 28 22:09:08 PDT 2019
Hi Florian, thanks for the update!
> On Oct 28, 2019, at 9:07 AM, Florian Hahn via cfe-dev <cfe-dev at lists.llvm.org> wrote:
> After the initial feedback, we evaluated using ‘flattened’ vectors to hold matrix values, instead of adding a new matrix type, as suggested originally. This was option #1 suggested in  <http://lists.llvm.org/pipermail/llvm-dev/2018-October/126982.html>.
> With that in mind, we propose to add a set of experimental intrinsics for matrix operations that require information about the shape/layout of the underlying matrix. The suggested intrinsics take the shape information as additional (constant) integer arguments and column major layout is assumed initially. With the new proposal, there are very few intrinsics that actually care about the memory layout and they can be easily extended to also support row-major layouts.
> Initially, we would like to propose the following intrinsics:
> <C x Ty> matrix_transpose(<C x Ty> %in, i32 <M>, i32 <N>)
> Treat %in as containing a matrix with M rows and N columns and transpose it.
> <C x Ty> matrix_multiply(<A x Ty> %X, <B x Ty> %Y, i32 <M>, i32 <N>, i32<K>)
> Treat %X as matrix with M rows and K columns, %Y as matrix with K rows and N columns and multiply them.
> <C x Ty>matrix_columnwise_load(Ty* %Ptr, i64 %Stride, i32 <M>, i32 <N>)
> Load a matrix with M rows and N columns, using a stride of %Stride between columns. This allows for convenient loading of sub matrixes.
> void matrix_columnwise_store(<C x Ty> %MatrixVal, ty* %Ptr, i64 %Stride, i32 <M>, i32 <N>)
> Store a matrix with M rows and N columns, using a stride of %Stride between columns. This allows for convenient storing of sub matrixes.
This all seems reasonable to me - a constrained extension that can start out as experimental. I personally don’t have any significant objections to this.
> The floating point versions of the intrinsics also take fast-math flags, which can be used to opt-in to FMA generation and/or constant folding opportunities via NoInfs and NoNaNs. We plan to add them to the lowered instructions and rely on InstCombine & Co for related optimisations.
What is your though on the FP semantics of matmul? There is a lot of room for interpretation and various ‘fast’ approximations that are occasionally interesting. Should this use the existing fast math flags or are they different?
> The intrinsics will be lowered to regular LLVM vector operations in a IR lowering pass. This means per default, we can lower the builtins on all targets.
> Before we do the actual lowering, we propagate the shape information from intrinsics to connected instructions. This allows us to improve the code we generate for regular IR operations on matrixes embedded in a flattened vector. In the example above, we propagate the information that we are loading a matrix with 2 rows and 4 columns to `%a = load <8 x double>, <8 x double>* %A, align 16` and lower it to a series of `load <2 x double>, <2 x double>*`, which helps with avoiding a large number of shufflevector instructions to get column vectors. Please note that propagating the shape information allows us to improve the code we generate during lowering, but is not required for correctness. Without propagating shape information, we would just need additional shuffles to extract the rows/columns at the point where we lower a matrix multiply for example.
I’m very glad to hear that this only affects performance but not correctness. This ensures that the extension is correct in the face of outlining and various code motion passes that can introduces weird phi nodes that may (in unusual cases) escape the limits of your reasoning.
> As future work, we are also evaluating adding additional operations including clamping a vector or matrix, min/max of matrixes, inverting a matrix and computing matrix determinates. Some of those might require additional intrinsics, which we can discuss on a case by case basis.
+1. It would be interesting to extend the set of target independent vector intrinsics in general.
> Potential Future Extensions
> For small matrixes, it might be desirable to automatically add additional padding to columns/rows, e.g. add 1 element padding to each column in a 3 x 3 matrix, to allow for using vector instructions operating on power-of-2 number of elements or satisfy an alignment requirement by a target. This allows for additional optimizations, but is not required for lowering the intrinsics. We also haven't seen this being an issue so far. We should be able to iterate on that, once it becomes an issue. (Earlier discussion  <http://lists.llvm.org/pipermail/llvm-dev/2018-December/128331.html> )
Random thought, but would it be enough to model these as undef elements if they became important?
> We propose adding a new matrix value type, that can be declared via a `matrix_type` attribute. Alternatively we could also generalise the existing ext_vector_type attribute ( <https://clang.llvm.org/docs/LanguageExtensions.html#vectors-and-extended-vectors>), if that is preferred.
I don’t care *strongly* about this at all, but I have a weak preference for introducing a new attribute for this. There is effectively no cost to doing so, and this would make the documentation in the clang extensions manual much easier to understand.
> In all cases, we require known constants as dimensions and we do not plan to support dynamic dimensions for now.
I’m sure the clang folks will want to know what you mean by ‘constant’s in terms of ICE, constexpr, integer template arguments, etc...
> We think our current proposal addresses the concerns raised previously, especially the concerns around the high cost of adding a new IR type, adding too many new intrinsics and generalising the approach to N dimensions. Unless there are any additional major concerns, we should be able to share patches for review soon.
+1, sounds like a very promising direction, thank you!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev