[PATCH] D139710: [AMDGPU] MachineScheduler: schedule execution metric added for the UnclusteredHighRPStage

Tue Dec 13 12:14:01 PST 2022

alex-t marked an inline comment as done.
alex-t added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:854
+  if (DAG.RegionsWithMinOcc[RegionIdx] &&
+      StageID == GCNSchedStageID::OccInitialSchedule) {
+    DAG.MinOccupancyRegionsMetrics[RegionIdx] = getScheduleMetrics();
----------------
kerbowa wrote:
> It might be better in UnclusteredHighRPStage::initGCNRegion(), but I guess the problem is that you need the DAG to be built already? Maybe we even need a virtual getSchedulerMetrics(), since I can imagine this being used for more then the UnclusteredHighRPStage eventually, and now it only does something in the initOccStage. If it stays in the initOccStage, can you add an assert that the next stage is UnclusteredHighRP?
Your guess is quite correct. My first intention was exactly UnclusteredHighRPStage::initGCNRegion(). Then I found that unfortunately, I have no DAG yet :)

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:1033
+        std::min(S.getTargetOccupancy(), PressureBefore.getOccupancy(ST));
+    unsigned Profit = ((WavesAfter * ScheduleMetricsScaleFactor) / WavesBefore *
+                       (OldMetric * ScheduleMetricsScaleFactor) / NewMetric) /
----------------
kerbowa wrote:
> What happens if the original schedule has no stalls, i.e. a metric of 0? Does that mean that no amount of occupancy increase can result in a profitable tradeoff?
> 
> This seems heavily biased towards increases in ILP. A single-stall cycle can be weighted as more important than an increase in occupancy from 2 to 3. I don't think this was the intention, was it? I think we need some sort of Occupancy scaling factor in the Profit calculation to have some way of tuning the importance of occupancy.
> 
I agree with you. And a scaling factor was planned. The reason it doesn't exist yet is that we still don't have access to HW other than Navi and we can't run enough tests to determine a reasonable scaling factor.
The idea behind this initial heuristic was: “This fixes the regression. Since we don't have an HW to test with, let's fix it and let QA see what happens."
But your comment is a good catch. Indeed, I should have added a compiler option for the scaling factor with a default value of 1.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139710/new/

https://reviews.llvm.org/D139710