[PATCH] D149773: [AMDGPU][IGLP] Add iglp_opt(1) strategy for single wave gemms

Sun Jun 11 23:32:42 PDT 2023

kerbowa added a comment.

Looks correct to me, my only concerns are about the performance of the rule implementations but, I'm not sure how critical that is if this is only targeting a few small kernels.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:1006
+        for (auto &Elt : SyncPipe[0].DAG->SUnits) {
+          if (TII->isMFMAorWMMA(*Elt.getInstr())) {
+            if (std::any_of(
----------------
Will these rules be called relatively often, would it help to cache the first 4 mfma in cases like this?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:1007
+          if (TII->isMFMAorWMMA(*Elt.getInstr())) {
+            if (std::any_of(
+                    Elt.Preds.begin(), Elt.Preds.end(),
----------------
Does it need to be a direct pred of the MFMA or does it only need to be able to be scheduled before it, could you use reachability in the DAG here instead of searching the preds like this?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:1064
+        // VALU already in the group
+        return std::any_of(
+            Collection.begin(), Collection.end(), [&SU, &TII](SUnit *Elt) {
----------------
Again, would caching these results help, or is it unlikely to be a bottleneck?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149773/new/

https://reviews.llvm.org/D149773