[PATCH] D130797: [AMDGPU] Implement pipeline solver for non-trivial pipelines
Austin Kerbow via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Jul 29 14:35:33 PDT 2022
kerbowa added a comment.
Thanks! I like the idea behind the greedy solver. Not sure about SchedGroupSU. Maybe just a map between SUs and lists of schedgroups? I think trying to track sched_group_barriers by their order and assigning that an index is a bit confusing.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:71
+
+static cl::opt<bool> EnableGreedySolver(
+ "amdgpu-igrouplp-greedy-solver",
----------------
Should we have the greedy solver be the default?
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:132
// Collection of SUnits that are classified as members of this group.
- SmallVector<SUnit *, 32> Collection;
+ // SmallVector<SUnit *, 32> Collection;
----------------
Remove commented code.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:201
+
+ SchedGroup() {}
+
----------------
Why have a default constructor?
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:286
+ CurrPipeline = BestPipeline;
+ while (static_cast<size_t>(BeginSyncGroupIdx) < ConflictedInstrs.size() &&
+ ConflictedInstrs[BeginSyncGroupIdx].size() == 0)
----------------
Can't SyncID be negative?
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:299
+ // Preserve the order of barrier for subsequent SchedGroupBarrier mutations
+ for (auto &SyncPipeline : BestPipeline) {
+ for (auto &SG : SyncPipeline) {
----------------
The only requirement I had was that the SCHED_GROUP_BARRIER ends up at the end of the SG. No need to respect NodeNum if we can get a better pipeline otherwise.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:323
+int PipelineSolver::addEdges(
+ SmallVector<SchedGroup, 4> &SyncPipeline, SUnit *SU,
+ const int AssignedGroupNo,
----------------
Use SmallVectorImpl for parameter types.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:568
+ LLVM_DEBUG(dbgs() << "Starting GREEDY pipeline solver\n");
+ solveGreedy();
+ }
----------------
Do we need a cl option for the greedy solver if that is what we do by default?
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:599
// between then.
- DenseMap<int, SmallVector<SchedGroup, 4>> SyncedSchedGroupsMap;
+ SmallVector<SmallVector<SchedGroup, 4>, 4> SyncedSchedGroups;
----------------
Why change this from a map. Can we really index by SyncID?
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:705
+
+ bool ShouldTryAddEdge = A != B;
+
----------------
Can continue earlier if A == B.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:710
+ // increase the cost
+ if (ShouldTryAddEdge && DAG->IsReachable(B, A))
+ ShouldTryAddEdge = false;
----------------
if (A == B || DAG->IsReachable(B, A))
continue;
Doesn't tryAddEdge already check for all of these conditions anyway?
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:948
int32_t Size = SGB.getOperand(1).getImm();
- int32_t SyncID = SGB.getOperand(2).getImm();
+ int32_t BarrierID = SGB.getOperand(2).getImm();
+
----------------
What is BarrierID, the same as SyncID?
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:978
+ SchedGroup PipelineGroup = SyncedSchedGroups[PipelineIdx][StageIdx];
+ std::vector<SUnit>::reverse_iterator RIter =
+ PipelineGroup.BarrierPosition;
----------------
I think we can just search the entire block. SCHED_GROUP_BARRIERS that appear later will get higher priority since we initialize them bottom up.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:986
+ auto Match =
+ std::find_if(TempIter, DAG->SUnits.rend(),
+ [&SU](SUnit &IterSU) { return &SU == &IterSU; });
----------------
Could use a set/map.
================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUIGroupLP.cpp:991
+ // Grow the SGSU matches to hold the new match
+ if (static_cast<size_t>(PipelineIdx) >= SGSU.Matches.size())
+ SGSU.Matches.resize(PipelineIdx + 1);
----------------
Is SGSU just caching whether an SU can be inserted into a SchedGroup?
================
Comment at: llvm/test/CodeGen/AMDGPU/sched-group-barrier-pre-RA.mir:77
; CHECK-NEXT: SCHED_GROUP_BARRIER 32, 1, 0
+ ; CHECK-NEXT: SCHED_GROUP_BARRIER 2, 1, 0
; CHECK-NEXT: [[V_MUL_LO_U32_e64_1:%[0-9]+]]:vgpr_32 = nsw V_MUL_LO_U32_e64 [[GLOBAL_LOAD_DWORD_SADDR]], [[DEF1]], implicit $exec
----------------
This seems like a regression. SCHED_GROUP_BARRIER should be at the end of its group.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D130797/new/
https://reviews.llvm.org/D130797
More information about the llvm-commits
mailing list