[PATCH] D70456: [Matrix] Add first set of matrix intrinsics and initial lowering pass.

Thu Nov 21 07:58:10 PST 2019

dmgreen added inline comments.

================
Comment at: llvm/docs/LangRef.rst:14429
+
+      declare vectorty @llvm.matrix.transpose.*(vectorty %in, i32 <rows>, i32 <cols>)
+
----------------
Does the size of the vector need to be rows * cols? Can it be larger? I presume it would be a problem if it was smaller.

================
Comment at: llvm/docs/LangRef.rst:14457
+The '``llvm.matrix.multiply.*``' intrinsic treats %A as matrix with <M> rows and <K> columns, %B as
+matrix with <K> rows and <N> columns and multiply them. The result matrix is returned embedded in the
+result vector.
----------------
Minor wording:
"The '``llvm.matrix.multiply.*``' intrinsic treats %A as a matrix with <M> rows and <K> columns, %B as a matrix with <K> rows and <N> columns and multiplies them."

================
Comment at: llvm/docs/LangRef.rst:14478
+
+The '``llvm.matrix.columnwise.load.*``' intrinsic loads a matrix with <Rows>
+rows and <Cols> columns, using a stride of %Stride between columns. The result
----------------
Can Stride be less that Cols?

================
Comment at: llvm/docs/LangRef.rst:14479
+The '``llvm.matrix.columnwise.load.*``' intrinsic loads a matrix with <Rows>
+rows and <Cols> columns, using a stride of %Stride between columns. The result
+matrix is returned embedded in the result vector. This allows for convenient
----------------
The returned matrix is expected to be packed, the memory layout is padded with Stride, correct?

================
Comment at: llvm/include/llvm/Transforms/Scalar.h:362

+Pass *createLowerMatrixIntrinsicsPass();
+
----------------
I guess this should have an extra comment, to be consistent.

================
Comment at: llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp:254
+  ///
+  /// The intrinsic loads a matrix from an array with an offset from the
+  /// initial element and a stride between columns
----------------
What does "offset from the initial element" refer to?

================
Comment at: llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp:258
+  /// See VisitLoadInst for handling actual load instructions
+  void LowerMatrixLoad(CallInst *Inst) {
+    IRBuilder<> Builder(Inst);
----------------
How big should we expect these to be? Will stamping out all the loads be OK for the larger cases or would it in be better to start creating loops? (Probably not a question that needs fixing in this commit. Just a question in general).

================
Comment at: llvm/test/Transforms/LowerMatrixIntrinsics/transpose.ll:36
+  %a = load <8 x double>, <8 x double> *%Ptr.A, align 16
+  %c  = call <8 x double> @llvm.matrix.transpose(<8 x double> %a, i32 2, i32 4)
+
----------------
llvm.matrix.transpose.v8f64

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70456/new/

https://reviews.llvm.org/D70456