[PATCH] D158368: [AMDGPU][MISCHED] GCNBalancedSchedStrategy.

Tue Aug 29 11:26:23 PDT 2023

alex-t added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.h:499
+    SchedStages.push_back(GCNSchedStageID::UnclusteredHighRPReschedule);
+    SchedStages.push_back(GCNSchedStageID::ClusteredLowOccupancyReschedule);
+    SchedStages.push_back(GCNSchedStageID::PreRARematerialize);
----------------
jrbyrnes wrote:
> alex-t wrote:
> > jrbyrnes wrote:
> > > Probably shouldn't allow occupancy drops in ClusteredLowOccupancyReschedule, otherwise we will need to rerun the phase. 
> > Okay, it may achieve better ILP, sacrificing the occupancy, and the resulting metric will be the same or even better. Did you mean we should have a separate check that reverts the  ClusteredLowOccupancyReschedule if the occupancy has been dropped?
> > Did you mean we should have a separate check that reverts the ClusteredLowOccupancyReschedule if the occupancy has been dropped?
> 
> Yes something like that -- if ClusteredLowOccupancy drops the occupancy then revert. Phase needs a stable occupancy in order for it to achieve its purpose.
> 
> 
> 
> 
I would prefer to change the heuristic in such a way that dropping occupancy leads to the low metric and the schedule is reverted automatically.

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.h:498
+    SchedStages.push_back(GCNSchedStageID::OccInitialSchedule);
+    SchedStages.push_back(GCNSchedStageID::UnclusteredHighRPReschedule);
+    SchedStages.push_back(GCNSchedStageID::ClusteredLowOccupancyReschedule);
----------------
jrbyrnes wrote:
> alex-t wrote:
> > jrbyrnes wrote:
> > > I'm not sure I understand the purpose of UnclusteredHighRPReschedule stage in the context of a balanced scheduler.
> > As I understood, the UnclusteredHighRPReschedule attempts to achieve better occupancy by removing most mutations, allowing for loose instruction placement.
> > Since our metric aims for the balance between better occupancy and better ILP, we might accept the result of the UnclusteredHighRPReschedule phase if it was managed to achieve better occupancy w/o the loss in ILP.
> The phase is only run for a subset of regions -- DAG.RegionsWithHighRP are in that subset. If we have high RP such that we are in danger of dropping occupancy, we reschedule the region without mutations to attempt to reduce RP and save our occupancy.
> 
> The meta heuristic used to determine which regions need rescheduling is based on register pressure and occupancy -- this doesn't seem in accordance with the spirit of a "balanced" scheduler.
The problem is that we report we are running out of registers right away when the occupancy is going under 4. Just because the VGPR limit is 32. That is exactly why we initially started to look for the heuristic: we have got a regression because of bumping the occupancy from 3 to 4 but introducing the huge latency.
Once again, nobody proved that this "magic" occupancy 4 is the optimal one for all the cases.
The current "naive" objective function:  
```
new_occupancy/old_occupancy * old_metric/new_metric
```
is just a starting point.  It already takes into account the change in occupancy versus change in ILP. The goal is to improve it to reflect our needs.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158368/new/

https://reviews.llvm.org/D158368