[llvm] [SystemZ] Add a SystemZ specific pre-RA scheduling strategy. (PR #135076)
Jonas Paulsson via llvm-commits
llvm-commits at lists.llvm.org
Wed Nov 19 15:46:55 PST 2025
JonPsson1 wrote:
Some experiments with liveness reduction (not committed to the branch), as to when to do / not do it when all uses are live:
- Trying different values of "topcycles", which affects both the HighSUs set and the HasDistToTop margin: increasing from the default of 2 to 3 and 4 gave worse perf results, while 5 and 6 got better again and then 7 worse again. 5 was best of these with a +0.74% average change across all benchmarks compared to 2. Conclusion: 2 seems best.
- Replacing HasDistToTop with "SubsumedByRemLat", with the idea of trying to be more precise and do differently with regions containing long latency instructions, instead of just counting the number remaining as a crude margin to the top. Looking at SU and its (closest) data successor as a unit, there is the remaining latency of (other) unscheduled nodes to consider.
```
| D |
Succ --> SU
Succ ---------> SU
------------> RemLat
Placing SU closer to Succ means D more (decoding) cycles are added to SU.
```
computed with "The decoding cycles for scheduling SU next plus its latency is less than the rem latency of the successor":
` NumLeft / IssueWidth + SU->Latency < Remaining latency of (closest) data successor.
`
```
Counting number of spill/reload (and copys) instructions in SPEC output:
main "With tiny regions limit"
Spill|Reload : 532477 528520 -3957 // -0.75%
Copies : 886962 886644 -318
main "HighSUs instead of TinyRegion" (performance ref below)
Spill|Reload : 532477 530071 -2406
Copies : 886962 886732 -230
main "HighSUs, but with Pres/Subsumes" (similar performance)
Spill|Reload : 532477 528575 -3902
Copies : 886962 887234 +272
main "HighSUs, but with Subsumes only" (slightly worse perf)
Spill|Reload : 532477 528468 -4009
Copies : 886962 886944 -18
main "No liveness reduction" (slightly worse perf)
Spill|Reload : 532477 532461 -16
Copies : 886962 886785 -177
```
Conclusion: the liveness reduction heuristic reduces spilling a bit, but performance is not directly in proportion to this alone, showing that it is important to consider other things such as latencies while helping liveness. SubsumedByRemLat is more involved but doesn't give any performance improvement.
- Another idea was to skip the HasDistToTop and only rely on HighSUs for the top margin. I tried various values of TopCycles (2 - 11), and found that around 6 or 7 this seemed to work fairly well with similar perf results (within 0.1% on average). Conclusion: using TopCycles of 6 as default could work, eliminating the computation of HasDistToTop and also showing that this is in fact mostly useful if used only in regions with at least a few dozen instructions.
```
main "No HasDistToTop, TopCycles=6" (similar performance)
Spill|Reload : 532477 531009 -1468
Copies : 886962 886822 -140
```
https://github.com/llvm/llvm-project/pull/135076
More information about the llvm-commits
mailing list