[llvm] [AMDGPU] IGLP: Fix static variables (PR #137549)
Robert Imschweiler via llvm-commits
llvm-commits at lists.llvm.org
Wed Apr 30 06:58:42 PDT 2025
================
@@ -1353,17 +1344,25 @@ bool MFMAExpInterleaveOpt::analyzeDAG(const SIInstrInfo *TII) {
auto isAdd = [](unsigned Opc) { return Opc == AMDGPU::V_ADD_F32_e32; };
+ // Heuristic helper function (see below)
+ auto IsFMACDataDep = [](SDep &Dep) {
+ return Dep.getKind() == SDep::Kind::Data &&
+ Dep.getSUnit()->getInstr()->getOpcode() == AMDGPU::V_FMAC_F32_e32;
+ };
+
AddPipeCount = 0;
for (SUnit &SU : DAG->SUnits) {
auto Opc = SU.getInstr()->getOpcode();
if (TII->isTRANS(Opc)) {
// Avoid counting a potential bonus V_EXP which all the MFMA depend on
- if (SU.Succs.size() >= 7)
+ // FIXME: This heuristic needs improvement/clarification!
+ // In general, the pipeline seems to look like this:
+ // fma_f32 -> exp_f32 -> cvt_f16_f32 -> v_pack_b32_f16 -> mfma_.._f16
+ // (with potential arithmetic between exp and cvt)
+ // see
+ // https://github.com/llvm/llvm-project/pull/80370#discussion_r1483660378
+ if (SU.Succs.size() >= 7 && any_of(SU.Succs, IsFMACDataDep))
----------------
ro-i wrote:
I forgot to mention. The problem basically may be the following (using llvm.amdgcn.iglp.opt.exp.small.mir as an example).
Take a look at the following snippet from the initial MIR:
```
undef %247.sub0:vreg_64_align2 = afn nofpexcept V_EXP_F32_e32 %246:vgpr_32, implicit $mode, implicit $exec
[...]
%0:vgpr_32 = contract nofpexcept V_FMAC_F32_e32 %1:vgpr_32, %247.sub0:vreg_64_align2, %0:vgpr_32, implicit $mode, implicit $exec
%259.sub0_sub1:vreg_512_align2 = contract nofpexcept V_PK_MUL_F32 8, %259.sub0_sub1:vreg_512_align2, 0, %247:vreg_64_align2, 0, 0, 0, 0, 0, implicit $mode, implicit $exec
[... a lot more V_PK_MULs ... ]
```
If I'm not mistaken, this basically means that only the lower 32bit of `%247` are actually defined. Now take a look at the following snippet taken from directly before postmisched runs:
```
renamable $vgpr16 = afn nofpexcept V_MUL_F32_e32 1069066811, killed $vgpr16, implicit $mode, implicit $exec
renamable $vgpr17 = afn nofpexcept V_MUL_F32_e32 1069066811, killed $vgpr17, implicit $mode, implicit $exec
[... vgpr16 and vgpr17 are used and vgpr16 is also reassigned ... ]
renamable $vgpr16 = afn nofpexcept V_EXP_F32_e32 killed $vgpr24, implicit $mode, implicit $exec
[...]
renamable $vgpr48_vgpr49 = contract nofpexcept V_PK_MUL_F32 8, $vgpr48_vgpr49, 0, $vgpr16_vgpr17, 0, 0, 0, 0, 0, implicit $mode, implicit $exec
[... a lot more V_PK_MULs ... ]
[...]
dead renamable $vgpr2 = contract nofpexcept V_FMAC_F32_e32 killed $vgpr4, killed $vgpr16, killed $vgpr2(tied-def 0), implicit $mode, implicit $exec
```
I think that vgpr17 is only "used" by V_PK_MUL_F32 because this instruction requires two dwords as parameter. It's value is not actually defined, which is why it's not detected to have more then 7 successors in the runs before RA.
https://github.com/llvm/llvm-project/pull/137549
More information about the llvm-commits
mailing list