[PATCH] D139710: [AMDGPU] MachineScheduler: schedule execution metric added for the UnclusteredHighRPStage

Mon Dec 12 07:36:45 PST 2022

alex-t added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:854
+  if (DAG.RegionsWithMinOcc[RegionIdx]) {
+    DAG.MinOccupancyRegionsMetrics[RegionIdx] = getScheduleMetrics();
+  }
----------------
rampitec wrote:
> You probably do not need to compute it always, just in UnclusteredHighRPStage?
I only need this at the stage preceding the UnclusteredHighRPStage because it is the "MetricBefore" computation. The UnclusteredHighRPStage only runs for the regions which conform to the condition: 

```
bool UnclusteredHighRPStage::initGCNRegion() {
  // Only reschedule regions with the minimum occupancy or regions that may have
  // spilling (excess register pressure).
  if ((!DAG.RegionsWithMinOcc[RegionIdx] ||
       DAG.MinOccupancy <= InitialOccupancy) &&
      !DAG.RegionsWithExcessRP[RegionIdx])
    return false;

  return GCNSchedStage::initGCNRegion();
}
```
What I should have done here, is to avoid this running for the GCNSchedStageID::ClusteredLowOccupancyReschedule stage.

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:945
+  unsigned SumBubbles = 0;
+  DenseMap<MachineInstr *, unsigned> Model;
+  unsigned CurrCycle = 0;
----------------
vpykhtin wrote:
> Model -> ReadyCycles?
What do you mean? Don't understand the question

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:952
+#endif
+    if (SUnit *SU = DAG.getSUnit(&MI)) {
+      unsigned ReadyCycle = CurrCycle;
----------------
vpykhtin wrote:
> if (!SU) continue;
> 
> what is it BTW? Debug instruction?
Not only debug. Copy f.ex.

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:1032
+      std::min(S.getTargetOccupancy(), PressureBefore.getOccupancy(ST));
+      float Profit = (static_cast<float>(WavesAfter) / WavesBefore *
+                      OldMetric / NewMetric);
----------------
rampitec wrote:
> Avoid using float. Use scaled integers.
What is the evil in float? I would agree if we're targeting the embedded platform with no or very expensive floating point support. Could you explain where is the overhead for x86like (for example)?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139710/new/

https://reviews.llvm.org/D139710