[llvm] [AMDGPU][Scheduler] Refactor ArchVGPR rematerialization during scheduling (PR #125885)

Tue Apr 8 07:17:24 PDT 2025

lucas-rami wrote:

Some pointers for the test churn:

- Generally our minimum allowable occupancy is now lower, which means that in some tests where the stage was trying to reduce spilling before it is now trying to increase occupancy. This leads to less remats in some cases (while still achieving the same occupancy as before) and a few instances where light code motion in tests is required to avoid rollbacks (which only happen when trying to increase occupancy) due to the excess pressure model being too optimistic. Adding the ability to continue remating if the heuristic was too optimistic seems like an important follow-up change.
- I changed the `omitted_subrange` test significantly because the "interesting MI" we want to be able to remat in it was no longer rematted with the new implementation; other MIs were evaluated first and were enough to improve occupancy. The test is now simpler and needs to remat this MI to be able to achieve higher occupancy.
- While fixing `reduce_arch_and_acc_vgrp_spill` I noticed some subtle issues in excess pressure tracking, notably in how progress is tracked for unified RFs when saving ArchVGPRs doesn't allow us to actually reduce pressure because of the ArchVGPR allocation granule. This is now resolved.
- Light changes in `reduce_spill_archvgpr_above_addressable_limit`/`reduce_spill_agpr_above_addressable_limit` are due to `AMDGPU::getWavesPerEU` ignoring `amdgpu-waves-per-eu` in some scenarios as I mention [here](https://github.com/llvm/llvm-project/pull/125885#discussion_r2033253965). The method considers that the minimum allowable occupancy is 4, when 1 should be ok per user request. I think this should be addressed separately, as it is really an issue of `AMDGPU::getWavesPerEU`.

https://github.com/llvm/llvm-project/pull/125885