[llvm] [SystemZ] Add a SystemZ specific pre-RA scheduling strategy. (PR #135076)

Mon Sep 8 05:25:26 PDT 2025

JonPsson1 wrote:

> As to register pressure - you're now using the pressure difference per SU computed by common code, but not the actual pressure itself at this point (which is also already computed by common code). Wouldn't that be important to consider? It doesn't really matter if an instruction would use one more vector register if we still have 20 unused at this point, does it?

First of all, per the comment in RegisterPressure.cpp: "The register tracker is unaware of global liveness so ignores normal live-thru ranges...", there is (currently) no truly accurate awareness of the number of used / free registers. 

The OOO reasoning I am using to motivate e.g. pulling a VL down right before its user is that this shouldn't matter if there are many (30) instructions above the VL, in a big region (`HasDistToTop`). And in the case where the scheduled latency is not increased by the VL (another SU at least as high was already scheduled below it) this is also less likely to matter (`PreservesSchedLat`).

I now made the experiment to try your idea, in two versions: The first one checks for 2 free regs, the second one for the pressure to be below half of the limit (it doesn't schedule the SU "low" if the register pressure is ok). If the current pressure was perfectly estimated, at least the second one should give no more spilling. I did get a few more spills/reloads with both versions:

```
patch <> "Pressure + 2 regs < Lim":  +3K spills/reloads
Improvements
0.988: i525.x264_r 
0.991: f538.imagick_r 
0.991: f519.lbm_r 
Regressions
1.035: f544.nab_r 
1.009: i500.perlbench_r 
1.006: f507.cactuBSSN_r 
```

```
patch <> "Pressure <= Lim / 2":  +1K spills/reloads
Improvements
0.990: f519.lbm_r 
Regressions 2017_L_Exp_PSetLimHalf:
1.036: f544.nab_r 
1.007: i505.mcf_r 
1.006: i523.xalancbmk_r 
```

It doesn't seem to give much one way or the other performance-wise... (I also tried another version using the "2 regs rule" only in cases where latency reduction is not enabled, with similar mixed results). Both of these show f544.nab_r regressing. I tried this again on this benchmark, but only when latency reduction is *not* enabled and then the regression dissapeared. There were however only 5 more spill/reloads causing the 3.6% regression, so it's likely not a spilling problem, so not sure exactly what is the cause. 

I agree that this should in theory not hurt if done correctly, and maybe GenericScheduler would do better if it set up the global liveness also. At least it shouldn't have to cause a heavy increase in spilling given that it checks the pressure 'Excess' very early. Another problem in the GenericScheduler relating to this is however that RegPressureTracker:: getUpwardPressureDelta() that is used seems to just track one PressureSet that exceeds its limit. That means e.g. GRX32Bit could be tracked, but then VR16Bit ignored, IIUC. I think it would have to be able to check any Excess that the PressureDiff of an SU would affect. So with global liveness and better handling of pressure Excess it should do better, but then again it will not matter in many interesting cases where there will be Excess anyway where it will always be a matter of reducing it as much as possible. @atrick - have you tried this?

@atrick has described the problem I saw in GenericScheduler as "backing itself into a corner". So the idea for me as been to not be so confident about the overall picture but rather do more as the opportunity arises. First reg-pressure as much as possible, and only after that latency reduction. For bigger regions (like in cactus), there will always be some spilling so in those cases it always makes sense to close the live range, and it seems to give good performance to do it under the given checks.

```
EXPERIMENT:
@@ -256,6 +258,9 @@ int SystemZPreRASchedStrategy::computeSULivenessScore(
     UsesLiveAll = !PrioKill && !GPRKill;
     StoreKill = (PrioKill || (!HasPrioUse && GPRKill));
   } else if (MO0.isReg() && MO0.getReg().isVirtual()) {
+    const RegPressureTracker &RPTracker = DAG->getBotRPTracker();
+    ArrayRef<unsigned> Pressure = RPTracker.getRegSetPressureAtPos();
+    bool InFPReg = false;
     int PrioPressureChange = 0;
     int GPRPressureChange = 0;
     const PressureDiff &PDiff = DAG->getPressureDiff(SU);
@@ -266,12 +271,40 @@ int SystemZPreRASchedStrategy::computeSULivenessScore(
         PrioPressureChange += PC.getUnitInc();
       else if (PC.getPSet() == GPRPressureSet)
         GPRPressureChange += PC.getUnitInc();
+      if (PC.getPSet() == SystemZ::FP16Bit)
+        InFPReg = true;
     }

     if (IsLoad) {
       bool PrioDefNoKill = PrioPressureChange == -RegWeight;
       bool GPRDefNoKill = GPRPressureChange == -RegWeight;
+
+      unsigned PSet = ~0;
+      unsigned RegFactor = 1;
+      if (PrioDefNoKill)
+        PSet = InFPReg ? SystemZ::FP16Bit : SystemZ::VR16Bit;
+      else if (GPRDefNoKill) {
+        PSet = SystemZ::GRX32Bit;
+        RegFactor = 2;
+      }
+      if (PSet != ~0) {
+        unsigned Lim = RPTracker.getRCI()->getRegPressureSetLimit(PSet);
+        if (!RPTracker.getLiveThru().empty())
+          Lim += RPTracker.getLiveThru()[PSet];
+
+        // EXPERIMENTAL:
+        // Guard against some other SU being scheduled instead that could
+        // cause 2 registers to become live. Skip this if it would still be
+        // below the limit.
+        // if (Pressure[PSet] + (2 * RegFactor) < Lim)
+        //   return 0;
+        // ~ or ~
+        // Alternatively, skip if pressure is low.
+        // if (Pressure[PSet] <= Lim / 2)
+        //   return 0;
+      }
+
       UsesLivePrio =

```

https://github.com/llvm/llvm-project/pull/135076