[llvm] [AMDGPU] IGLP: Fix static variables (PR #137549)

Wed Apr 30 06:58:42 PDT 2025

================
@@ -1353,17 +1344,25 @@ bool MFMAExpInterleaveOpt::analyzeDAG(const SIInstrInfo *TII) {
 
   auto isAdd = [](unsigned Opc) { return Opc == AMDGPU::V_ADD_F32_e32; };
 
+  // Heuristic helper function (see below)
+  auto IsFMACDataDep = [](SDep &Dep) {
+    return Dep.getKind() == SDep::Kind::Data &&
+           Dep.getSUnit()->getInstr()->getOpcode() == AMDGPU::V_FMAC_F32_e32;
+  };
+
   AddPipeCount = 0;
   for (SUnit &SU : DAG->SUnits) {
     auto Opc = SU.getInstr()->getOpcode();
     if (TII->isTRANS(Opc)) {
       // Avoid counting a potential bonus V_EXP which all the MFMA depend on
-      if (SU.Succs.size() >= 7)
+      // FIXME: This heuristic needs improvement/clarification!
+      // In general, the pipeline seems to look like this:
+      //   fma_f32 -> exp_f32 -> cvt_f16_f32 -> v_pack_b32_f16 -> mfma_.._f16
+      //   (with potential arithmetic between exp and cvt)
+      //   see
+      //   https://github.com/llvm/llvm-project/pull/80370#discussion_r1483660378
+      if (SU.Succs.size() >= 7 && any_of(SU.Succs, IsFMACDataDep))
----------------
ro-i wrote:

I forgot to mention. The problem basically may be the following (using llvm.amdgcn.iglp.opt.exp.small.mir as an example).
Take a look at the following snippet from the initial MIR:
```
  undef %247.sub0:vreg_64_align2 = afn nofpexcept V_EXP_F32_e32 %246:vgpr_32, implicit $mode, implicit $exec
[...]
  %0:vgpr_32 = contract nofpexcept V_FMAC_F32_e32 %1:vgpr_32, %247.sub0:vreg_64_align2, %0:vgpr_32, implicit $mode, implicit $exec
  %259.sub0_sub1:vreg_512_align2 = contract nofpexcept V_PK_MUL_F32 8, %259.sub0_sub1:vreg_512_align2, 0, %247:vreg_64_align2, 0, 0, 0, 0, 0, implicit $mode, implicit $exec
[... a lot more V_PK_MULs ... ]
```
If I'm not mistaken, this basically means that only the lower 32bit of `%247` are actually defined. Now take a look at the following snippet taken from directly before postmisched runs:
```
  renamable $vgpr16 = afn nofpexcept V_MUL_F32_e32 1069066811, killed $vgpr16, implicit $mode, implicit $exec
  renamable $vgpr17 = afn nofpexcept V_MUL_F32_e32 1069066811, killed $vgpr17, implicit $mode, implicit $exec
[... vgpr16 and vgpr17 are used and vgpr16 is also reassigned ... ]
  renamable $vgpr16 = afn nofpexcept V_EXP_F32_e32 killed $vgpr24, implicit $mode, implicit $exec
[...]
  renamable $vgpr48_vgpr49 = contract nofpexcept V_PK_MUL_F32 8, $vgpr48_vgpr49, 0, $vgpr16_vgpr17, 0, 0, 0, 0, 0, implicit $mode, implicit $exec
[... a lot more V_PK_MULs ... ]
[...]
 dead renamable $vgpr2 = contract nofpexcept V_FMAC_F32_e32 killed $vgpr4, killed $vgpr16, killed $vgpr2(tied-def 0), implicit $mode, implicit $exec
```
I think that vgpr17 is only "used" by V_PK_MUL_F32 because this instruction requires two dwords as parameter. It's value is not actually defined, which is why it's not detected to have more then 7 successors in the runs before RA.

https://github.com/llvm/llvm-project/pull/137549