[PATCH] D124678: [AMDGPU] Allow for MFMA Inst Clustering

Fri Apr 29 10:15:04 PDT 2022

kerbowa added a comment.

Would be nice to have some tests that show the results of the clustering as well.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp:846
+namespace {
+struct MFMAClusterDAGMutation : ScheduleDAGMutation {
+  const SIInstrInfo *TII;
----------------
Should this be moved to a new file I.e. AMDGPUMacroFusion and AMDGPUExportClustering?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp:856
+      if (!TII->isMAI(MAI) ||
+          MAI.getOpcode() == AMDGPU::V_ACCVGPR_WRITE_B32_e64 ||
+          MAI.getOpcode() == AMDGPU::V_ACCVGPR_READ_B32_e64)
----------------
arsenm wrote:
> What about copies before they are lowered to accvgpr_write?
I guess isMAI above would handle that?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp:882-886
+      while (NextIdx < End) {
+        if (ClusterSize >= MFMAClusterSize)
+          break;
+
+        for (; NextIdx < End; ++NextIdx) {
----------------
Could these two loops be combined?

================
Comment at: llvm/test/CodeGen/AMDGPU/mfma-cluster.mir:1
+# RUN: llc -march=amdgcn -mcpu=gfx90a -start-before=machine-scheduler %s -o - -amdgpu-mfma-cluster=1 --debug-only=amdgpu-subtarget,machine-scheduler  2>&1 | FileCheck -check-prefix=DEFAULT %s
+# RUN: llc -march=amdgcn -mcpu=gfx90a -start-before=machine-scheduler %s -o - -amdgpu-mfma-cluster=1 -amdgpu-mfma-cluster-size=2 --misched-bottomup --debug-only=amdgpu-subtarget,machine-scheduler  2>&1 | FileCheck -check-prefix=TWOLIMIT %s
----------------
Add `# REQUIRES: asserts`, to the top of this test.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124678/new/

https://reviews.llvm.org/D124678