[PATCH] D130797: [AMDGPU] Implement pipeline solver for non-trivial pipelines

Mon Aug 1 13:31:38 PDT 2022

jrbyrnes added a comment.

In D130797#3688340 <https://reviews.llvm.org/D130797#3688340>, @kerbowa wrote:

> Thanks! I like the idea behind the greedy solver. Not sure about SchedGroupSU. Maybe just a map between SUs and lists of schedgroups? I think trying to track sched_group_barriers by their order and assigning that an index is a bit confusing.

Austin thanks for looking at this and for the comments.

I can certainly see how the sched_group_barrier indexing is confusing -- I needed a bidirectional iterator in order to support the backtracking nature of the exact solver. My solution was to use an array and I needed a way to make sure I still supported the user inputted SyncID -- thus the mapping between SyncIDs and array indices. However, transforming the data structure to support bidirection seems to be a problem that should be resolved in the PipelineSolver.

I think if I apply this design principle, I can redo the SchedGroupSUs and clean up the collection phase in general.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:71
+
+static cl::opt<bool> EnableGreedySolver(
+    "amdgpu-igrouplp-greedy-solver",
----------------
kerbowa wrote:
> Should we have the greedy solver be the default?
Sure -- makes sense

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:286
+    CurrPipeline = BestPipeline;
+    while (static_cast<size_t>(BeginSyncGroupIdx) < ConflictedInstrs.size() &&
+           ConflictedInstrs[BeginSyncGroupIdx].size() == 0)
----------------
kerbowa wrote:
> Can't SyncID be negative?
SyncID (i.e. *SyncGroupIdx) here is the index into the pipeline array -- it should never be negative

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:599
   // between then.
-  DenseMap<int, SmallVector<SchedGroup, 4>> SyncedSchedGroupsMap;
+  SmallVector<SmallVector<SchedGroup, 4>, 4> SyncedSchedGroups;

----------------
kerbowa wrote:
> Why change this from a map. Can we really index by SyncID?
I didn't see any bidirectional iterator for DenseMap -- which I needed for the exact solver. Thus, I converted to an array. 

To make sure the we are still synchronizing a pipeline in the correct way, I created a map BarrierIDToPipelineID which maps the user inputted IDs to a legal array index.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:710
+    // increase the cost
+    if (ShouldTryAddEdge && DAG->IsReachable(B, A))
+      ShouldTryAddEdge = false;
----------------
kerbowa wrote:
> if (A == B || DAG->IsReachable(B, A))
>   continue;
> 
> Doesn't tryAddEdge already check for all of these conditions anyway?
Yeah, I'll clean this up a bit. Thanks.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:948
   int32_t Size = SGB.getOperand(1).getImm();
-  int32_t SyncID = SGB.getOperand(2).getImm();
+  int32_t BarrierID = SGB.getOperand(2).getImm();
+
----------------
kerbowa wrote:
> What is BarrierID, the same as SyncID?
BarrierID is SyncID in your version (the user inputted syncrhonization ID).

This is different from the PipelineSolver's *SyncGroupIdx which is an array index. I'll rework naming to make things more consistent and less confusing.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:978
+        SchedGroup PipelineGroup = SyncedSchedGroups[PipelineIdx][StageIdx];
+        std::vector<SUnit>::reverse_iterator RIter =
+            PipelineGroup.BarrierPosition;
----------------
kerbowa wrote:
> I think we can just search the entire block. SCHED_GROUP_BARRIERS that appear later will get higher priority since we initialize them bottom up.
I'll take another look at this collection phase to see if I can shave off any iterations

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:978
+        SchedGroup PipelineGroup = SyncedSchedGroups[PipelineIdx][StageIdx];
+        std::vector<SUnit>::reverse_iterator RIter =
+            PipelineGroup.BarrierPosition;
----------------
jrbyrnes wrote:
> kerbowa wrote:
> > I think we can just search the entire block. SCHED_GROUP_BARRIERS that appear later will get higher priority since we initialize them bottom up.
> I'll take another look at this collection phase to see if I can shave off any iterations
I'll take another look at this collection phase to see if I can shave off any iterations

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:991
+          // Grow the SGSU matches to hold the new match
+          if (static_cast<size_t>(PipelineIdx) >= SGSU.Matches.size())
+            SGSU.Matches.resize(PipelineIdx + 1);
----------------
kerbowa wrote:
> Is SGSU just caching whether an SU can be inserted into a SchedGroup?
Yeah, a map between SU -> Candidate SchedGroups is a fundamental piece of the exact algorithm

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130797/new/

https://reviews.llvm.org/D130797