[llvm] b559535 - [NewPM] Avoid redundant CGSCC run for updated SCC

Sun Jun 14 20:06:39 PDT 2020

Author: Wenlei He
Date: 2020-06-14T19:54:52-07:00
New Revision: b559535a3a21fa1f3ee20fbcd4bad4bc840d0a49

URL: https://github.com/llvm/llvm-project/commit/b559535a3a21fa1f3ee20fbcd4bad4bc840d0a49
DIFF: https://github.com/llvm/llvm-project/commit/b559535a3a21fa1f3ee20fbcd4bad4bc840d0a49.diff

LOG: [NewPM] Avoid redundant CGSCC run for updated SCC

Summary:
When an SCC got split due to inlining, we have two mechanisms for reprocessing the updated SCC, first is UR.UpdatedC
that repeatedly rerun the new, current SCC; second is a worklist for all newly split SCCs. We can avoid rerun of
the same SCC when the SCC is set to be processed by both mechanisms *back to back*. In pathological cases, such redundant
rerun could cause exponential size growth due to inlining along cycles, even when there's no SCC mutation and hence
convergence is not a problem.

Note that it's ok to have SCC updated and rerun immediately, and also in the work list if we have actually moved an SCC
to be topologically "below" the current one due to merging. In that case, we will need to revisit the current SCC after
those moved SCCs. For that reason, the redundant avoidance here only targets back to back rerun of the same SCC - the
case described by the now removed FIXME comment.

Reviewers: chandlerc, wmi

Subscribers: llvm-commits, hoy

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80589

Added: 
    llvm/test/Transforms/Inline/cgscc-cycle-debug.ll

Modified: 
    llvm/include/llvm/Analysis/CGSCCPassManager.h
    llvm/unittests/Analysis/CGSCCPassManagerTest.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/include/llvm/Analysis/CGSCCPassManager.h b/llvm/include/llvm/Analysis/CGSCCPassManager.h
index b5b58686d958..eb0d3ae8fedf 100644

--- a/llvm/include/llvm/Analysis/CGSCCPassManager.h
+++ b/llvm/include/llvm/Analysis/CGSCCPassManager.h
@@ -820,6 +820,12 @@ ModuleToPostOrderCGSCCPassAdaptor<CGSCCPassT>::run(Module &M,
       LLVM_DEBUG(dbgs() << "Running an SCC pass across the RefSCC: " << *RC
                         << "\n");
 
+      // The top of the worklist may *also* be the same SCC we just ran over
+      // (and invalidated for). Keep track of that last SCC we processed due
+      // to SCC update to avoid redundant processing when an SCC is both just
+      // updated itself and at the top of the worklist.
+      LazyCallGraph::SCC *LastUpdatedC = nullptr;
+
       // Push the initial SCCs in reverse post-order as we'll pop off the
       // back and so see this in post-order.
       for (LazyCallGraph::SCC &C : llvm::reverse(*RC))
@@ -835,6 +841,10 @@ ModuleToPostOrderCGSCCPassAdaptor<CGSCCPassT>::run(Module &M,
           LLVM_DEBUG(dbgs() << "Skipping an invalid SCC...\n");
           continue;
         }
+        if (LastUpdatedC == C) {
+          LLVM_DEBUG(dbgs() << "Skipping redundant run on SCC: " << *C << "\n");
+          continue;
+        }
         if (&C->getOuterRefSCC() != RC) {
           LLVM_DEBUG(dbgs() << "Skipping an SCC that is now part of some other "
                                "RefSCC...\n");
@@ -863,11 +873,6 @@ ModuleToPostOrderCGSCCPassAdaptor<CGSCCPassT>::run(Module &M,
         // invalidate the analyses for any SCCs other than themselves which
         // are mutated. However, that seems to lose the robustness of the
         // pass-manager driven invalidation scheme.
-        //
-        // FIXME: This is redundant in one case -- the top of the worklist may
-        // *also* be the same SCC we just ran over (and invalidated for). In
-        // that case, we'll end up doing a redundant invalidation here as
-        // a consequence.
         CGAM.invalidate(*C, UR.CrossSCCPA);
 
         do {
@@ -877,6 +882,7 @@ ModuleToPostOrderCGSCCPassAdaptor<CGSCCPassT>::run(Module &M,
           assert(&C->getOuterRefSCC() == RC &&
                  "Processing an SCC in a 
diff erent RefSCC!");
 
+          LastUpdatedC = UR.UpdatedC;
           UR.UpdatedRC = nullptr;
           UR.UpdatedC = nullptr;
 

diff  --git a/llvm/test/Transforms/Inline/cgscc-cycle-debug.ll b/llvm/test/Transforms/Inline/cgscc-cycle-debug.ll
new file mode 100644
index 000000000000..435d540208bc
--- /dev/null
+++ b/llvm/test/Transforms/Inline/cgscc-cycle-debug.ll
@@ -0,0 +1,45 @@
+; When an SCC got split due to inlining, we have two mechanisms for reprocessing the updated SCC, first is UR.UpdatedC
+; that repeatedly rerun the new, current SCC; second is a worklist for all newly split SCCs. We need to avoid rerun of
+; the same SCC when the SCC is set to be processed by both mechanisms back to back. In pathological cases, such extra,
+; redundant rerun could cause exponential size growth due to inlining along cycles.
+;
+; The test cases here illustrates potential redundant rerun and how it's prevented, however there's no extra inlining
+; even if we allow the redundant rerun. In real code, when inliner makes 
diff erent decisions for 
diff erent call sites
+; of the same caller-callee edge, we could end up getting more recursive inlining without SCC mutation.
+;
+; REQUIRES: asserts
+; RUN: opt < %s -passes='cgscc(inline)' -inline-threshold=500 -debug-only=cgscc -S 2>&1 | FileCheck %s
+
+; CHECK: Running an SCC pass across the RefSCC: [(test1_c, test1_a, test1_b)]
+; CHECK: Enqueuing the existing SCC in the worklist:(test1_b)
+; CHECK: Enqueuing a newly formed SCC:(test1_c)
+; CHECK: Enqueuing a new RefSCC in the update worklist: [(test1_b)]
+; CHECK: Switch an internal ref edge to a call edge from 'test1_a' to 'test1_c'
+; CHECK: Switch an internal ref edge to a call edge from 'test1_a' to 'test1_a'
+; CHECK: Re-running SCC passes after a refinement of the current SCC: (test1_c, test1_a)
+; CHECK: Skipping redundant run on SCC: (test1_c, test1_a)
+; CHECK: Skipping an SCC that is now part of some other RefSCC...
+
+declare void @external(i32 %seed)
+
+define void @test1_a(i32 %num) {
+entry:
+  call void @test1_b(i32 %num)
+  call void @external(i32 %num)
+  ret void
+}
+
+define void @test1_b(i32 %num) {
+entry:
+  call void @test1_c(i32 %num)
+  call void @test1_a(i32 %num)
+  call void @external(i32 %num)
+  ret void
+}
+
+define void @test1_c(i32 %num) #0 {
+  call void @test1_a(i32 %num)
+  ret void
+}
+
+attributes #0 = { noinline nounwind optnone }
\ No newline at end of file

diff  --git a/llvm/unittests/Analysis/CGSCCPassManagerTest.cpp b/llvm/unittests/Analysis/CGSCCPassManagerTest.cpp
index 7a9e588a50c9..2dad605395c3 100644
--- a/llvm/unittests/Analysis/CGSCCPassManagerTest.cpp
+++ b/llvm/unittests/Analysis/CGSCCPassManagerTest.cpp
@@ -1289,29 +1289,29 @@ TEST_F(CGSCCPassManagerTest, TestAnalysisInvalidationCGSCCUpdate) {
   // We run over four SCCs the first time. But then we split an SCC into three.
   // And then we merge those three back into one. However, this also
   // invalidates all three SCCs further down in the PO walk.
-  EXPECT_EQ(4 + 3 + 1 + 3, SCCAnalysisRuns);
+  EXPECT_EQ(4 + 3 + 3, SCCAnalysisRuns);
   // The module analysis pass should be run three times.
   EXPECT_EQ(3, ModuleAnalysisRuns);
   // We run over four SCCs the first time. Then over the two new ones. Then the
   // entire module is invalidated causing a full run over all seven. Then we
   // fold three SCCs back to one, re-compute for it and the two SCCs above it
   // in the graph, and then run over the whole module again.
-  EXPECT_EQ(4 + 2 + 7 + 1 + 3 + 4, IndirectSCCAnalysisRuns);
-  EXPECT_EQ(4 + 2 + 7 + 1 + 3 + 4, DoublyIndirectSCCAnalysisRuns);
+  EXPECT_EQ(4 + 2 + 7 + 3 + 4, IndirectSCCAnalysisRuns);
+  EXPECT_EQ(4 + 2 + 7 + 3 + 4, DoublyIndirectSCCAnalysisRuns);
 
   // First we run over all six functions. Then we re-run it over three when we
   // split their SCCs. Then we re-run over the whole module. Then we re-run
   // over three functions merged back into a single SCC, then those three
   // functions again, the two functions in SCCs above it in the graph, and then
   // over the whole module again.
-  EXPECT_EQ(6 + 3 + 6 + 3 + 3 + 2 + 6, FunctionAnalysisRuns);
+  EXPECT_EQ(6 + 3 + 6 + 3 + 2 + 6, FunctionAnalysisRuns);
 
   // Re run the function analysis over the entire module, and then re-run it
   // over the `(h3, h1, h2)` SCC due to invalidation. Then we re-run it over
   // the entire module, then the three functions merged back into a single SCC,
   // those three functions again, then the two functions in SCCs above it in
   // the graph, and then over the whole module.
-  EXPECT_EQ(6 + 3 + 6 + 3 + 3 + 2 + 6, IndirectFunctionAnalysisRuns);
+  EXPECT_EQ(6 + 3 + 6 + 3 + 2 + 6, IndirectFunctionAnalysisRuns);
 }
 
 // The (negative) tests below check for assertions so we only run them if NDEBUG