[PATCH] D139710: [AMDGPU] MachineScheduler: schedule execution metric added for the UnclusteredHighRPStage

Mon Dec 12 22:37:03 PST 2022

kerbowa added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:854
+  if (DAG.RegionsWithMinOcc[RegionIdx] &&
+      StageID == GCNSchedStageID::OccInitialSchedule) {
+    DAG.MinOccupancyRegionsMetrics[RegionIdx] = getScheduleMetrics();
----------------
It might be better in UnclusteredHighRPStage::initGCNRegion(), but I guess the problem is that you need the DAG to be built already? Maybe we even need a virtual getSchedulerMetrics(), since I can imagine this being used for more then the UnclusteredHighRPStage eventually, and now it only does something in the initOccStage. If it stays in the initOccStage, can you add an assert that the next stage is UnclusteredHighRP?

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:935
+ScheduleMetrics GCNSchedStage::getScheduleMetrics() {
+  struct comp {
+    bool operator()(std::pair<MachineInstr *, unsigned> A,
----------------
Is this debug only?

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:948
+  unsigned CurrCycle = 0;
+  for (auto &MI : DAG) {
+#ifndef NDEBUG
----------------
Can't this just iterate over SUnits?

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:960
+        MachineInstr *DefMI = D.getSUnit()->getInstr();
+        unsigned Latency =
+            SI->getInstrLatency(ST.getInstrItineraryData(), *DefMI);
----------------
What about unbuffered resources like MFMA? I guess maybe it should be considered in future patches.

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:961
+        unsigned Latency =
+            SI->getInstrLatency(ST.getInstrItineraryData(), *DefMI);
+        unsigned DefReady = ReadyCycles[DefMI];
----------------
Should this be TargetSchedModel::computeInstrLatency? We don't use itineraries so I think it is just falling back here.

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:1033
+        std::min(S.getTargetOccupancy(), PressureBefore.getOccupancy(ST));
+    unsigned Profit = ((WavesAfter * ScheduleMetricsScaleFactor) / WavesBefore *
+                       (OldMetric * ScheduleMetricsScaleFactor) / NewMetric) /
----------------
What happens if the original schedule has no stalls, i.e. a metric of 0? Does that mean that no amount of occupancy increase can result in a profitable tradeoff?

This seems heavily biased towards increases in ILP. A single-stall cycle can be weighted as more important than an increase in occupancy from 2 to 3. I don't think this was the intention, was it? I think we need some sort of Occupancy scaling factor in the Profit calculation to have some way of tuning the importance of occupancy.

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.h:130

+const unsigned ScheduleMetricsScaleFactor = 100;
+
----------------
Could this be moved into the class below so it is not in the llvm namespace?

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.h:135
+  unsigned BubbleCycles;
+  public:
+  ScheduleMetrics() {}
----------------
NIT: LLVM formating.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139710/new/

https://reviews.llvm.org/D139710