[PATCH] D130797: [AMDGPU] Implement pipeline solver for non-trivial pipelines

Fri Jul 29 14:35:33 PDT 2022

kerbowa added a comment.

Thanks! I like the idea behind the greedy solver. Not sure about SchedGroupSU. Maybe just a map between SUs and lists of schedgroups? I think trying to track sched_group_barriers by their order and assigning that an index is a bit confusing.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:71
+
+static cl::opt<bool> EnableGreedySolver(
+    "amdgpu-igrouplp-greedy-solver",
----------------
Should we have the greedy solver be the default?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:132
   // Collection of SUnits that are classified as members of this group.
-  SmallVector<SUnit *, 32> Collection;
+  //  SmallVector<SUnit *, 32> Collection;

----------------
Remove commented code.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:201
+
+  SchedGroup() {}
+
----------------
Why have a default constructor? 

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:286
+    CurrPipeline = BestPipeline;
+    while (static_cast<size_t>(BeginSyncGroupIdx) < ConflictedInstrs.size() &&
+           ConflictedInstrs[BeginSyncGroupIdx].size() == 0)
----------------
Can't SyncID be negative?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:299
+  // Preserve the order of barrier for subsequent SchedGroupBarrier mutations
+  for (auto &SyncPipeline : BestPipeline) {
+    for (auto &SG : SyncPipeline) {
----------------
The only requirement I had was that the SCHED_GROUP_BARRIER ends up at the end of the SG. No need to respect NodeNum if we can get a better pipeline otherwise.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:323
+int PipelineSolver::addEdges(
+    SmallVector<SchedGroup, 4> &SyncPipeline, SUnit *SU,
+    const int AssignedGroupNo,
----------------
Use SmallVectorImpl for parameter types.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:568
+    LLVM_DEBUG(dbgs() << "Starting GREEDY pipeline solver\n");
+    solveGreedy();
+  }
----------------
Do we need a cl option for the greedy solver if that is what we do by default?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:599
   // between then.
-  DenseMap<int, SmallVector<SchedGroup, 4>> SyncedSchedGroupsMap;
+  SmallVector<SmallVector<SchedGroup, 4>, 4> SyncedSchedGroups;

----------------
Why change this from a map. Can we really index by SyncID?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:705
+
+    bool ShouldTryAddEdge = A != B;
+
----------------
Can continue earlier if A == B.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:710
+    // increase the cost
+    if (ShouldTryAddEdge && DAG->IsReachable(B, A))
+      ShouldTryAddEdge = false;
----------------
if (A == B || DAG->IsReachable(B, A))
  continue;

Doesn't tryAddEdge already check for all of these conditions anyway?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:948
   int32_t Size = SGB.getOperand(1).getImm();
-  int32_t SyncID = SGB.getOperand(2).getImm();
+  int32_t BarrierID = SGB.getOperand(2).getImm();
+
----------------
What is BarrierID, the same as SyncID?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:978
+        SchedGroup PipelineGroup = SyncedSchedGroups[PipelineIdx][StageIdx];
+        std::vector<SUnit>::reverse_iterator RIter =
+            PipelineGroup.BarrierPosition;
----------------
I think we can just search the entire block. SCHED_GROUP_BARRIERS that appear later will get higher priority since we initialize them bottom up.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:986
+        auto Match =
+            std::find_if(TempIter, DAG->SUnits.rend(),
+                         [&SU](SUnit &IterSU) { return &SU == &IterSU; });
----------------
Could use a set/map.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:991
+          // Grow the SGSU matches to hold the new match
+          if (static_cast<size_t>(PipelineIdx) >= SGSU.Matches.size())
+            SGSU.Matches.resize(PipelineIdx + 1);
----------------
Is SGSU just caching whether an SU can be inserted into a SchedGroup?

================
Comment at: llvm/test/CodeGen/AMDGPU/sched-group-barrier-pre-RA.mir:77
     ; CHECK-NEXT: SCHED_GROUP_BARRIER 32, 1, 0
+    ; CHECK-NEXT: SCHED_GROUP_BARRIER 2, 1, 0
     ; CHECK-NEXT: [[V_MUL_LO_U32_e64_1:%[0-9]+]]:vgpr_32 = nsw V_MUL_LO_U32_e64 [[GLOBAL_LOAD_DWORD_SADDR]], [[DEF1]], implicit $exec
----------------
This seems like a regression. SCHED_GROUP_BARRIER should be at the end of its group.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130797/new/

https://reviews.llvm.org/D130797