[PATCH] D139710: [AMDGPU] MachineScheduler: schedule execution metric added for the UnclusteredHighRPStage

Mon Dec 12 10:58:21 PST 2022

rampitec added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:854
+  if (DAG.RegionsWithMinOcc[RegionIdx]) {
+    DAG.MinOccupancyRegionsMetrics[RegionIdx] = getScheduleMetrics();
+  }
----------------
alex-t wrote:
> rampitec wrote:
> > You probably do not need to compute it always, just in UnclusteredHighRPStage?
> I only need this at the stage preceding the UnclusteredHighRPStage because it is the "MetricBefore" computation. The UnclusteredHighRPStage only runs for the regions which conform to the condition: 
> 
> ```
> bool UnclusteredHighRPStage::initGCNRegion() {
>   // Only reschedule regions with the minimum occupancy or regions that may have
>   // spilling (excess register pressure).
>   if ((!DAG.RegionsWithMinOcc[RegionIdx] ||
>        DAG.MinOccupancy <= InitialOccupancy) &&
>       !DAG.RegionsWithExcessRP[RegionIdx])
>     return false;
> 
>   return GCNSchedStage::initGCNRegion();
> }
> ```
> What I should have done here, is to avoid this running for the GCNSchedStageID::ClusteredLowOccupancyReschedule stage.
> I only need this at the stage preceding the UnclusteredHighRPStage because it is the "MetricBefore" computation. The UnclusteredHighRPStage only runs for the regions which conform to the condition: 
> 
> ```
> bool UnclusteredHighRPStage::initGCNRegion() {
>   // Only reschedule regions with the minimum occupancy or regions that may have
>   // spilling (excess register pressure).
>   if ((!DAG.RegionsWithMinOcc[RegionIdx] ||
>        DAG.MinOccupancy <= InitialOccupancy) &&
>       !DAG.RegionsWithExcessRP[RegionIdx])
>     return false;
> 
>   return GCNSchedStage::initGCNRegion();
> }
> ```
> What I should have done here, is to avoid this running for the GCNSchedStageID::ClusteredLowOccupancyReschedule stage.

I believe it is better to place it into the `UnclusteredHighRPStage::initGCNSchedStage()`. This will allow to rearrange stages and also skip it if `UnclusteredHighRPStage` itself is skipped.

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:1032
+      std::min(S.getTargetOccupancy(), PressureBefore.getOccupancy(ST));
+      float Profit = (static_cast<float>(WavesAfter) / WavesBefore *
+                      OldMetric / NewMetric);
----------------
alex-t wrote:
> rampitec wrote:
> > Avoid using float. Use scaled integers.
> What is the evil in float? I would agree if we're targeting the embedded platform with no or very expensive floating point support. Could you explain where is the overhead for x86like (for example)?
> What is the evil in float? I would agree if we're targeting the embedded platform with no or very expensive floating point support. Could you explain where is the overhead for x86like (for example)?

We are not only running on x86. For instance our lit tests are being run on a variety of platforms and llvm does support it. Besides you do not want to get different results depending on the host platform or even host compiler used due to different rounding. In addition it is always good to avoid it so that llvm itself does not need to link in soft float on other platforms simply because their compiler also supports codegen for amdgpu.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139710/new/

https://reviews.llvm.org/D139710