[llvm] 2f8e6b5 - [ScheduleDAGRRList] Limit number of candidates to explore.
Florian Hahn via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 23 03:44:11 PDT 2020
Author: Florian Hahn
Date: 2020-07-23T11:35:33+01:00
New Revision: 2f8e6b5f3c86a75f6a75c6955e3b4bf0d26c3a91
URL: https://github.com/llvm/llvm-project/commit/2f8e6b5f3c86a75f6a75c6955e3b4bf0d26c3a91
DIFF: https://github.com/llvm/llvm-project/commit/2f8e6b5f3c86a75f6a75c6955e3b4bf0d26c3a91.diff
LOG: [ScheduleDAGRRList] Limit number of candidates to explore.
Currently popFromQueueImpl iterates over all candidates to find the best
one. While the candidate queue is small, this is not a problem. But it
becomes a problem once the queue gets larger. For example, the snippet
below takes 330s to compile with llc -O0, but completes in 3s with this
patch.
define void @test(i4000000* %ptr) {
entry:
store i4000000 0, i4000000* %ptr, align 4
ret void
}
This patch limits the number of candidates to check to 1000. This limit
ensures that it never triggers for test-suite/SPEC2000/SPEC2006 on X86
and AArch64 with -O3, while still drastically limiting the compile-time
in case of very large queues.
It would be even better to use a binary heap to manage to queue
(D83335), but some heuristics change the score of a node in the queue
after another node has been scheduled. I plan to address this for
backends that use the MachineScheduler in the future, but that requires
a more careful evaluation. In the meantime, the limit should help users
impacted by this issue.
The patch includes a slightly smaller version of the motivating example
as test case, to guard against the issue.
Reviewers: efriedma, paquette, niravd
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D84328
Added:
llvm/test/CodeGen/X86/stress-scheduledagrrlist.ll
Modified:
llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp
Removed:
################################################################################
diff --git a/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp b/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp
index 72e68a5045c6..ad6a6cdd8250 100644
--- a/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp
@@ -1838,13 +1838,15 @@ class RegReductionPQBase : public SchedulingPriorityQueue {
template<class SF>
static SUnit *popFromQueueImpl(std::vector<SUnit *> &Q, SF &Picker) {
- std::vector<SUnit *>::iterator Best = Q.begin();
- for (auto I = std::next(Q.begin()), E = Q.end(); I != E; ++I)
- if (Picker(*Best, *I))
- Best = I;
- SUnit *V = *Best;
- if (Best != std::prev(Q.end()))
- std::swap(*Best, Q.back());
+ unsigned BestIdx = 0;
+ // Only compute the cost for the first 1000 items in the queue, to avoid
+ // excessive compile-times for very large queues.
+ for (unsigned I = 1, E = std::min(Q.size(), 1000ul); I != E; I++)
+ if (Picker(Q[BestIdx], Q[I]))
+ BestIdx = I;
+ SUnit *V = Q[BestIdx];
+ if (BestIdx + 1 != Q.size())
+ std::swap(Q[BestIdx], Q.back());
Q.pop_back();
return V;
}
diff --git a/llvm/test/CodeGen/X86/stress-scheduledagrrlist.ll b/llvm/test/CodeGen/X86/stress-scheduledagrrlist.ll
new file mode 100644
index 000000000000..a699134a8c7c
--- /dev/null
+++ b/llvm/test/CodeGen/X86/stress-scheduledagrrlist.ll
@@ -0,0 +1,12 @@
+; RUN: llc -O0 -mtriple=x86_64-apple-macosx %s -o %t.s
+
+; Stress test for the list scheduler. The store will be expanded to a very
+; large number of stores during isel, stressing ScheduleDAGRRList. It should
+; compiles in a reasonable amount of time. Run with -O0, to disable most other
+; optimizations.
+
+define void @test(i1000000* %ptr) {
+entry:
+ store i1000000 0, i1000000* %ptr, align 4
+ ret void
+}
More information about the llvm-commits
mailing list