[llvm] 2f8e6b5 - [ScheduleDAGRRList] Limit number of candidates to explore.

Thu Jul 23 03:44:11 PDT 2020

Author: Florian Hahn
Date: 2020-07-23T11:35:33+01:00
New Revision: 2f8e6b5f3c86a75f6a75c6955e3b4bf0d26c3a91

URL: https://github.com/llvm/llvm-project/commit/2f8e6b5f3c86a75f6a75c6955e3b4bf0d26c3a91
DIFF: https://github.com/llvm/llvm-project/commit/2f8e6b5f3c86a75f6a75c6955e3b4bf0d26c3a91.diff

LOG: [ScheduleDAGRRList] Limit number of candidates to explore.

Currently popFromQueueImpl iterates over all candidates to find the best
one. While the candidate queue is small, this is not a problem. But it
becomes a problem once the queue gets larger. For example, the snippet
below takes 330s to compile with llc -O0, but completes in 3s with this
patch.

define void @test(i4000000* %ptr) {
entry:
  store i4000000 0, i4000000* %ptr, align 4
  ret void
}

This patch limits the number of candidates to check to 1000. This limit
ensures that it never triggers for test-suite/SPEC2000/SPEC2006 on X86
and AArch64 with -O3, while still drastically limiting the compile-time
in case of very large queues.

It would be even better to use a binary heap to manage to queue
(D83335), but some heuristics change the score of a node in the queue
after another node has been scheduled. I plan to address this for
backends that use the MachineScheduler in the future, but that requires
a more careful evaluation. In the meantime, the limit should help users
impacted by this issue.

The patch includes a slightly smaller version of the motivating example
as test case, to guard against the issue.

Reviewers: efriedma, paquette, niravd

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84328

Added: 
    llvm/test/CodeGen/X86/stress-scheduledagrrlist.ll

Modified: 
    llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp b/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp
index 72e68a5045c6..ad6a6cdd8250 100644

--- a/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGRRList.cpp
@@ -1838,13 +1838,15 @@ class RegReductionPQBase : public SchedulingPriorityQueue {
 
 template<class SF>
 static SUnit *popFromQueueImpl(std::vector<SUnit *> &Q, SF &Picker) {
-  std::vector<SUnit *>::iterator Best = Q.begin();
-  for (auto I = std::next(Q.begin()), E = Q.end(); I != E; ++I)
-    if (Picker(*Best, *I))
-      Best = I;
-  SUnit *V = *Best;
-  if (Best != std::prev(Q.end()))
-    std::swap(*Best, Q.back());
+  unsigned BestIdx = 0;
+  // Only compute the cost for the first 1000 items in the queue, to avoid
+  // excessive compile-times for very large queues.
+  for (unsigned I = 1, E = std::min(Q.size(), 1000ul); I != E; I++)
+    if (Picker(Q[BestIdx], Q[I]))
+      BestIdx = I;
+  SUnit *V = Q[BestIdx];
+  if (BestIdx + 1 != Q.size())
+    std::swap(Q[BestIdx], Q.back());
   Q.pop_back();
   return V;
 }

diff  --git a/llvm/test/CodeGen/X86/stress-scheduledagrrlist.ll b/llvm/test/CodeGen/X86/stress-scheduledagrrlist.ll
new file mode 100644
index 000000000000..a699134a8c7c
--- /dev/null
+++ b/llvm/test/CodeGen/X86/stress-scheduledagrrlist.ll
@@ -0,0 +1,12 @@
+; RUN: llc -O0 -mtriple=x86_64-apple-macosx %s -o %t.s
+
+; Stress test for the list scheduler. The store will be expanded to a very
+; large number of stores during isel, stressing ScheduleDAGRRList. It should
+; compiles in a reasonable amount of time. Run with -O0, to disable most other
+; optimizations.
+
+define void @test(i1000000* %ptr) {
+entry:
+  store i1000000 0, i1000000* %ptr, align 4
+  ret void
+}