[PATCH] D130797: [AMDGPU] Implement pipeline solver for non-trivial pipelines
Jeffrey Byrnes via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Aug 1 13:31:38 PDT 2022
jrbyrnes added a comment.
In D130797#3688340 <https://reviews.llvm.org/D130797#3688340>, @kerbowa wrote:
> Thanks! I like the idea behind the greedy solver. Not sure about SchedGroupSU. Maybe just a map between SUs and lists of schedgroups? I think trying to track sched_group_barriers by their order and assigning that an index is a bit confusing.
Austin thanks for looking at this and for the comments.
I can certainly see how the sched_group_barrier indexing is confusing -- I needed a bidirectional iterator in order to support the backtracking nature of the exact solver. My solution was to use an array and I needed a way to make sure I still supported the user inputted SyncID -- thus the mapping between SyncIDs and array indices. However, transforming the data structure to support bidirection seems to be a problem that should be resolved in the PipelineSolver.
I think if I apply this design principle, I can redo the SchedGroupSUs and clean up the collection phase in general.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:71
+
+static cl::opt<bool> EnableGreedySolver(
+ "amdgpu-igrouplp-greedy-solver",
----------------
kerbowa wrote:
> Should we have the greedy solver be the default?
Sure -- makes sense
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:286
+ CurrPipeline = BestPipeline;
+ while (static_cast<size_t>(BeginSyncGroupIdx) < ConflictedInstrs.size() &&
+ ConflictedInstrs[BeginSyncGroupIdx].size() == 0)
----------------
kerbowa wrote:
> Can't SyncID be negative?
SyncID (i.e. *SyncGroupIdx) here is the index into the pipeline array -- it should never be negative
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:599
// between then.
- DenseMap<int, SmallVector<SchedGroup, 4>> SyncedSchedGroupsMap;
+ SmallVector<SmallVector<SchedGroup, 4>, 4> SyncedSchedGroups;
----------------
kerbowa wrote:
> Why change this from a map. Can we really index by SyncID?
I didn't see any bidirectional iterator for DenseMap -- which I needed for the exact solver. Thus, I converted to an array.
To make sure the we are still synchronizing a pipeline in the correct way, I created a map BarrierIDToPipelineID which maps the user inputted IDs to a legal array index.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:710
+ // increase the cost
+ if (ShouldTryAddEdge && DAG->IsReachable(B, A))
+ ShouldTryAddEdge = false;
----------------
kerbowa wrote:
> if (A == B || DAG->IsReachable(B, A))
> continue;
>
> Doesn't tryAddEdge already check for all of these conditions anyway?
Yeah, I'll clean this up a bit. Thanks.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:948
int32_t Size = SGB.getOperand(1).getImm();
- int32_t SyncID = SGB.getOperand(2).getImm();
+ int32_t BarrierID = SGB.getOperand(2).getImm();
+
----------------
kerbowa wrote:
> What is BarrierID, the same as SyncID?
BarrierID is SyncID in your version (the user inputted syncrhonization ID).
This is different from the PipelineSolver's *SyncGroupIdx which is an array index. I'll rework naming to make things more consistent and less confusing.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:978
+ SchedGroup PipelineGroup = SyncedSchedGroups[PipelineIdx][StageIdx];
+ std::vector<SUnit>::reverse_iterator RIter =
+ PipelineGroup.BarrierPosition;
----------------
kerbowa wrote:
> I think we can just search the entire block. SCHED_GROUP_BARRIERS that appear later will get higher priority since we initialize them bottom up.
I'll take another look at this collection phase to see if I can shave off any iterations
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:978
+ SchedGroup PipelineGroup = SyncedSchedGroups[PipelineIdx][StageIdx];
+ std::vector<SUnit>::reverse_iterator RIter =
+ PipelineGroup.BarrierPosition;
----------------
jrbyrnes wrote:
> kerbowa wrote:
> > I think we can just search the entire block. SCHED_GROUP_BARRIERS that appear later will get higher priority since we initialize them bottom up.
> I'll take another look at this collection phase to see if I can shave off any iterations
I'll take another look at this collection phase to see if I can shave off any iterations
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:991
+ // Grow the SGSU matches to hold the new match
+ if (static_cast<size_t>(PipelineIdx) >= SGSU.Matches.size())
+ SGSU.Matches.resize(PipelineIdx + 1);
----------------
kerbowa wrote:
> Is SGSU just caching whether an SU can be inserted into a SchedGroup?
Yeah, a map between SU -> Candidate SchedGroups is a fundamental piece of the exact algorithm
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D130797/new/
https://reviews.llvm.org/D130797
More information about the llvm-commits
mailing list