[PATCH] D117562: [AMDGPU] Sink immediate VGPR defs if high RP

Wed Jan 26 10:06:14 PST 2022

rampitec added a comment.

Setting hard occupancy limit after scheduling might be a good idea actually. At that point we know an approximate register pressure pretty well, so if there will be spilling it shall be minimal and can be worth increasing the occupancy. In fact we may even want some small amount of spilling in return of occupancy, that often helps. All of that is a subject of thresholds and heuristics of course. I see 2 problems though:

1. Scheduler will calculate RP with instructions hoisted. It will then need to identify rematerializable instructions which can be sunk and estimate the profitability of sinking them so it could artificially increase the occupancy. This is not necessarily an easy task.
2. For gfx90a there a combined pressure of VGPRs and AGPRs because they are essentially aliases of the same registers, and then there is ACCUM_OFFSET to set the split. To actually reserve registers we would need to account for AGPRs separately and quite precisely. But even then that will be a problem if we are scheduling a standalone function because at the end of the day there is a single ACCUM_OFFSET for the whole kernel. Right now we are giving RA a generous register budget and hope for the best. If we start reserving registers we will be unable to do it.

I am currently thinking about something like that:

- Collect trivially rematerializable instructions with a single use outside of the def block at a first scheduler pass.
- Collect blocks with a high RP (AFAIR we are already doing that).
- Add a new scheduler pass after the current passes.
- Filter the collected list of instructions only keeping those with (only) uses in the blocks limiting the RP and high live-through pressure.
- Count how much instructions we have. If enough to cover register deficit proceed with sinking defs right to the use.
- Reschedule region into which we have sunk the def. If successful in increasing occupancy keep the schedule. If not revert it.

The caveat is to keep revert list for multiple blocks because even a single block will limit the occupancy for the whole kernel. I'd suggest to only do it if we have a single block limiting occupancy at least initially.

Note that this method does not need Loop Info as well.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D117562/new/

https://reviews.llvm.org/D117562