[PATCH] D70456: [Matrix] Add first set of matrix intrinsics and initial lowering pass.

Thu Nov 21 14:48:09 PST 2019

fhahn marked an inline comment as done.
fhahn added inline comments.

================
Comment at: llvm/docs/LangRef.rst:14429
+
+      declare vectorty @llvm.matrix.transpose.*(vectorty %in, i32 <rows>, i32 <cols>)
+
----------------
dmgreen wrote:
> Does the size of the vector need to be rows * cols? Can it be larger? I presume it would be a problem if it was smaller.
Good catch! I think being bigger would mean we only process the first rows * columns, but I think we should restrict the vector size to match rows * columns. Not sure if we can do that on the intrinsic level, but I'll add asserts.

================
Comment at: llvm/docs/LangRef.rst:14479
+The '``llvm.matrix.columnwise.load.*``' intrinsic loads a matrix with <Rows>
+rows and <Cols> columns, using a stride of %Stride between columns. The result
+matrix is returned embedded in the result vector. This allows for convenient
----------------
dmgreen wrote:
> The returned matrix is expected to be packed, the memory layout is padded with Stride, correct?
No, the packed matrix in the flat vector will only contain the elements in the columns/rows, as described by the dimensions. The 'stride' elements are not loaded. I've tried to clarify the meaning of the stride argument. Please let me know if that makes more sense now.

================
Comment at: llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp:254
+  ///
+  /// The intrinsic loads a matrix from an array with an offset from the
+  /// initial element and a stride between columns
----------------
dmgreen wrote:
> What does "offset from the initial element" refer to?
That was a left-over from an earlier version, where the intrinsic took an extra offset parameter. I've updated the comment (same for LowerMatrixStore).

================
Comment at: llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp:258
+  /// See VisitLoadInst for handling actual load instructions
+  void LowerMatrixLoad(CallInst *Inst) {
+    IRBuilder<> Builder(Inst);
----------------
dmgreen wrote:
> How big should we expect these to be? Will stamping out all the loads be OK for the larger cases or would it in be better to start creating loops? (Probably not a question that needs fixing in this commit. Just a question in general).
There is no upper bound to the dimensions, other than limits imposed by the frontend (and the number of vector elements for LLVM's vector type), so for larger matrixes this won't generate great code. For our initial use cases, we focused on smaller matrixes (<= 16x16) and we tried to keep things simple for the initial patch.

As follow ups, we are planning on adding patches that generate loops for operations on larger matrixes, e.g. generating a fused load/multiply/add/store loops.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70456/new/

https://reviews.llvm.org/D70456