[PATCH] D139710: [AMDGPU] MachineScheduler: schedule execution metric added for the UnclusteredHighRPStage
Alexander via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Dec 12 07:36:45 PST 2022
alex-t added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:854
+ if (DAG.RegionsWithMinOcc[RegionIdx]) {
+ DAG.MinOccupancyRegionsMetrics[RegionIdx] = getScheduleMetrics();
+ }
----------------
rampitec wrote:
> You probably do not need to compute it always, just in UnclusteredHighRPStage?
I only need this at the stage preceding the UnclusteredHighRPStage because it is the "MetricBefore" computation. The UnclusteredHighRPStage only runs for the regions which conform to the condition:
```
bool UnclusteredHighRPStage::initGCNRegion() {
// Only reschedule regions with the minimum occupancy or regions that may have
// spilling (excess register pressure).
if ((!DAG.RegionsWithMinOcc[RegionIdx] ||
DAG.MinOccupancy <= InitialOccupancy) &&
!DAG.RegionsWithExcessRP[RegionIdx])
return false;
return GCNSchedStage::initGCNRegion();
}
```
What I should have done here, is to avoid this running for the GCNSchedStageID::ClusteredLowOccupancyReschedule stage.
================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:945
+ unsigned SumBubbles = 0;
+ DenseMap<MachineInstr *, unsigned> Model;
+ unsigned CurrCycle = 0;
----------------
vpykhtin wrote:
> Model -> ReadyCycles?
What do you mean? Don't understand the question
================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:952
+#endif
+ if (SUnit *SU = DAG.getSUnit(&MI)) {
+ unsigned ReadyCycle = CurrCycle;
----------------
vpykhtin wrote:
> if (!SU) continue;
>
> what is it BTW? Debug instruction?
Not only debug. Copy f.ex.
================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:1032
+ std::min(S.getTargetOccupancy(), PressureBefore.getOccupancy(ST));
+ float Profit = (static_cast<float>(WavesAfter) / WavesBefore *
+ OldMetric / NewMetric);
----------------
rampitec wrote:
> Avoid using float. Use scaled integers.
What is the evil in float? I would agree if we're targeting the embedded platform with no or very expensive floating point support. Could you explain where is the overhead for x86like (for example)?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D139710/new/
https://reviews.llvm.org/D139710
More information about the llvm-commits
mailing list