[PATCH] D102733: [Matrix] Factor and distribute transposes across multiplies

Fri May 21 08:29:57 PDT 2021

anemet marked 3 inline comments as done.
anemet added inline comments.

================
Comment at: llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp:679
+  /// Try moving transposes in order to fold them away or into multiplies.
+  void optimizeTransposes() {
+    // First sink all transposes inside matmuls, hoping that we end up with NN,
----------------
fhahn wrote:
> For the later transforms, we collect a worklist once which contains all matrix instructions. Could we use the same here to avoid having to iterate over each function again?
Unless we really think this is a performance issue, I'd like to avoid the extra bookkeeping and just represent everything in the IR and no on-the-side data structure that needs updating.  As I was saying offline I think we already have too much bookkeeping going on (e.g. for the remarks) so it's hard to know what to update at times ().  Having a backward and a forward matrix algebraic simplification pass (which is what optimizeTransposes is) that is logically separated from the lowering pass I think makes a good sense in terms of "separation of concerns".  What do you think?

================
Comment at: llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp:748
+
+    // If we have a TT matmul, lift the transpose until we have a non-TT situation.
+    for (BasicBlock &BB: Func) {
----------------
fhahn wrote:
> >  If we have a TT matmul, lift the transpose until we have a non-TT situation.
> 
> Is this comment accurate? IIUC we only convert TT multiplies to versions where we can fold one transpose into the multiply?
Rephrased the comment.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102733/new/

https://reviews.llvm.org/D102733