[llvm] [AMDGPU][Scheduler] Consistent occupancy calculation during rematerialization (PR #149224)

Sun Aug 3 16:49:05 PDT 2025

================
@@ -412,16 +411,19 @@ bool GCNRPTarget::isSaveBeneficial(Register Reg,
     return RP.getSGPRNum() > MaxSGPRs;
   unsigned NumVGPRs =
       SRI->isAGPRClass(RC) ? RP.getAGPRNum() : RP.getArchVGPRNum();
-  return isVGPRBankSaveBeneficial(NumVGPRs);
+  // The addressable limit must always be respected.
+  if (NumVGPRs > MaxVGPRs)
+    return true;
+  // For unified RFs, combined VGPR usage limit must be respected as well.
+  return UnifiedRF && RP.getVGPRNum(true) > MaxUnifiedVGPRs;
----------------
lucas-rami wrote:

> By reducing cross RC pressure any time we're over the MaxUnifiedVGPRs, we are telling the rematerializer to issue cross RC copies to increase occupancy.

Apologies, I am not sure I understand.

I guess we agree on the spilling case ($MaxVGPRs=256 \wedge MaxUnifiedVGPRs=512$) since in that case $NumVGPRsInRC \leq MaxVGPRs \wedge RP.getVGPRNum(true) > MaxUnifiedVGPRs \Longrightarrow NumVGPRsInOtherRC > MaxVGPRs$ (modulo the VGPR allocation granule in the unified computation) i.e., we only do cross-RC saves if there are too many excess VGPRs in the other RC to fit through copies in the current RC.

For the occupancy increase case ($0<MaxVGPRs=MaxUnifiedVGPRs\leq256$) we always have $NumVGPRsInRC<256$ and $NumVGPRsInOtherRC<256$ otherwise the stage would be trying to reduce spilling. If $NumVGPRsInRC \leq MaxVGPRs \wedge RP.getVGPRNum(true) > MaxUnifiedVGPRs$, isn't any VGPR/AGPR save beneficial? Is there a chance we increase the number of cross RC copies by always saving there?

https://github.com/llvm/llvm-project/pull/149224