[llvm] [AMDGPU][Scheduler] Consistent occupancy calculation during rematerialization (PR #149224)

Thu Aug 7 10:07:54 PDT 2025

================
@@ -412,16 +411,19 @@ bool GCNRPTarget::isSaveBeneficial(Register Reg,
     return RP.getSGPRNum() > MaxSGPRs;
   unsigned NumVGPRs =
       SRI->isAGPRClass(RC) ? RP.getAGPRNum() : RP.getArchVGPRNum();
-  return isVGPRBankSaveBeneficial(NumVGPRs);
+  // The addressable limit must always be respected.
+  if (NumVGPRs > MaxVGPRs)
+    return true;
+  // For unified RFs, combined VGPR usage limit must be respected as well.
+  return UnifiedRF && RP.getVGPRNum(true) > MaxUnifiedVGPRs;
----------------
jrbyrnes wrote:

Yeah I think the way you've implemented it is fine actually. 

I was looking at a case where we use the waves-per-eu attribute. Unless amdgpu-agpr-alloc is also specified, we will just split the unified RF in half during RA. In that case, with the current implementation, we would be be inserted cross RC copies to increase occupancy. I'm not sure how concerned we should be with this case, since for target's with UnifiedRF, we should be using pure vgpr unless in occupancy 1 case.

However, in the default case (no attributes), RA can allocate up to the maxNumVGPRs for each RC, which should encourage a more optimal split while honoring the unified limit.

https://github.com/llvm/llvm-project/pull/149224