[PATCH] D102733: [Matrix] Factor and distribute transposes across multiplies

Thu May 20 06:28:33 PDT 2021

fhahn added inline comments.

================
Comment at: llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp:679
+  /// Try moving transposes in order to fold them away or into multiplies.
+  void optimizeTransposes() {
+    // First sink all transposes inside matmuls, hoping that we end up with NN,
----------------
For the later transforms, we collect a worklist once which contains all matrix instructions. Could we use the same here to avoid having to iterate over each function again?

================
Comment at: llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp:748
+
+    // If we have a TT matmul, lift the transpose until we have a non-TT situation.
+    for (BasicBlock &BB: Func) {
----------------
>  If we have a TT matmul, lift the transpose until we have a non-TT situation.

Is this comment accurate? IIUC we only convert TT multiplies to versions where we can fold one transpose into the multiply?

================
Comment at: llvm/test/Transforms/LowerMatrixIntrinsics/transpose-and-multiply-fold.ll:11
+
+define void @double_transpose(<9 x double>* %A, <9 x double>* %B) {
+; CHECK:      Pass:            lower-matrix-intrinsics
----------------
I think it would also be good to have tests that check the generated IR, together with some combinations with non-square matrixes.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102733/new/

https://reviews.llvm.org/D102733