[PATCH] D124678: [AMDGPU] Allow for MFMA Inst Clustering

Fri Apr 29 10:18:40 PDT 2022

rampitec added a comment.

One thing to mention: transferring all successors and predecessors technically does not guarantee that an independent instruction will not be scheduled in between. I see that you are adding SDep::Cluster, but I remember it didn't use to work in the very similar scenario before. Not sure it is still so. It can happen that in a bigger program the cluster may be broken.

Also we probably need to consider dropping MFMA clustering along with load clustering during GCNScheduleDAGMILive::UnclusteredReschedule stage.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp:898
+        if (MFMAOpa->NodeNum > MFMAOpb->NodeNum)
+          std::swap(MFMAOpa, MFMAOpb);
+
----------------
What if one feeds another and you swap them?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp:900
+
+        DAG->addEdge(MFMAOpb, SDep(MFMAOpa, SDep::Cluster));
+
----------------
I believe you need to check the result of addEgde.

================
Comment at: llvm/test/CodeGen/AMDGPU/mfma-cluster.mir:3
+# RUN: llc -march=amdgcn -mcpu=gfx90a -start-before=machine-scheduler %s -o - -amdgpu-mfma-cluster=1 -amdgpu-mfma-cluster-size=2 --misched-bottomup --debug-only=amdgpu-subtarget,machine-scheduler  2>&1 | FileCheck -check-prefix=TWOLIMIT %s
+
+
----------------
You need '; REQUIRES: asserts' if you want to inspect debug output.

================
Comment at: llvm/test/CodeGen/AMDGPU/mfma-cluster.mir:33
+# TWOLIMIT-NEXT: Copy Pred SU(6)
+
+---
----------------
You also need to check resulting MIR.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124678/new/

https://reviews.llvm.org/D124678