[llvm] [AMDGPU] Add scheduling stage to rewrite MFMA from VGPR to AGPR (PR #149367)

Fri Jul 18 06:44:21 PDT 2025

lucas-rami wrote:

About the heuristic, instead of relying on cycle depth, how about using block frequencies and latency estimates of a cross-class copy vs a spill save/restore to determine how much copying we can afford without increasing latency? This is what I am doing to estimate rematerialization benefit in my upcoming scoring system for remat candidates ([branch](https://github.com/lucas-rami/llvm-project/blob/remat-score-system)), so I think the cost of deriving block frequencies could even be factored in among the scheduler's stages.

https://github.com/llvm/llvm-project/pull/149367