[llvm] 88da019 - Fix a bug in the inliner that causes subsequent double inlining

Thu Apr 2 21:09:44 PDT 2020

Author: Hongtao Yu
Date: 2020-04-02T21:08:05-07:00
New Revision: 88da01997725f4ff1587d8944540986c42d47bf6

URL: https://github.com/llvm/llvm-project/commit/88da01997725f4ff1587d8944540986c42d47bf6
DIFF: https://github.com/llvm/llvm-project/commit/88da01997725f4ff1587d8944540986c42d47bf6.diff

LOG: Fix a bug in the inliner that causes subsequent double inlining

Summary:
A recent change in the instruction simplifier enables a call to a function that just returns one of its parameter to be simplified as simply loading the parameter. This exposes a bug in the inliner where double inlining may be involved which in turn may cause compiler ICE when an already-inlined callsite is reused for further inlining.
To put it simply, in the following-like C program, when the function call second(t) is inlined, its code t = third(t) will be reduced to just loading the return value of the callsite first(). This causes the inliner internal data structure to register the first() callsite for the call edge representing the third() call, therefore incurs a double inlining when both call edges are considered an inline candidate. I'm making a fix to break the inliner from reusing a callsite for new call edges.

```
void top()
{
    int t = first();
    second(t);
}

void second(int t)
{
   t = third(t);
   fourth(t);
}

void third(int t)
{
   return t;
}
```
The actual failing case is much trickier than the example here and is only reproducible with the legacy inliner. The way the legacy inliner works is to process each SCC in a bottom-up order. That means in reality function first may be already inlined into top, or function third is either inlined to second or is folded into nothing. To repro the failure seen from building a large application, we need to figure out a way to confuse the inliner so that the bottom-up inlining is not fulfilled. I'm doing this by making the second call indirect so that the alias analyzer fails to figure out the right call graph edge from top to second and top can be processed before second during the bottom-up.  We also need to tweak the test code so that when the inlining of top happens, the function body of second is not that optimized, by delaying the pass of function attribute deducer (i.e, which tells function third has no side effect and just returns its parameter). Since the CGSCC pass is iterative, additional calls are added to top to postpone the inlining of second to the second round right after the first function attribute deducing pass is done. I haven't been able to repro the failure with the new pass manager since the processing order of ininlined callsites is a bit different, but in theory the issue could happen there too.

Note that this fix could introduce a side effect that blocks the simplification of inlined code, specifically for a call site that can be folded to another call site. I hope this can probably be complemented by subsequent inlining or folding, as shown in the attached unit test. The ideal fix should be to separate the use of VMap. However, in reality this failing pattern shouldn't happen often. And even if it happens, there should be a good chance that the non-folded call site will be refolded by iterative inlining or subsequent simplification.

Reviewers: wenlei, davidxl, tejohnson

Reviewed By: wenlei, davidxl

Subscribers: eraman, nikic, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76248

Added: 
    llvm/test/Transforms/Inline/inline_call.ll

Modified: 
    llvm/lib/Transforms/IPO/Inliner.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Transforms/IPO/Inliner.cpp b/llvm/lib/Transforms/IPO/Inliner.cpp
index 5ce3da340ade..1b10c5433b46 100644

--- a/llvm/lib/Transforms/IPO/Inliner.cpp
+++ b/llvm/lib/Transforms/IPO/Inliner.cpp
@@ -712,8 +712,22 @@ inlineCallsImpl(CallGraphSCC &SCC, CallGraph &CG,
           int NewHistoryID = InlineHistory.size();
           InlineHistory.push_back(std::make_pair(Callee, InlineHistoryID));
 
-          for (Value *Ptr : InlineInfo.InlinedCalls)
+#ifndef NDEBUG
+          // Make sure no dupplicates in the inline candidates. This could
+          // happen when a callsite is simpilfied to reusing the return value
+          // of another callsite during function cloning, thus the other
+          // callsite will be reconsidered here.
+          DenseSet<CallSite> DbgCallSites;
+          for (auto &II : CallSites)
+            DbgCallSites.insert(II.first);
+#endif
+
+          for (Value *Ptr : InlineInfo.InlinedCalls) {
+#ifndef NDEBUG
+            assert(DbgCallSites.count(CallSite(Ptr)) == 0);
+#endif
             CallSites.push_back(std::make_pair(CallSite(Ptr), NewHistoryID));
+          }
         }
       }
 

diff  --git a/llvm/test/Transforms/Inline/inline_call.ll b/llvm/test/Transforms/Inline/inline_call.ll
new file mode 100644
index 000000000000..fb000f0c805f
--- /dev/null
+++ b/llvm/test/Transforms/Inline/inline_call.ll
@@ -0,0 +1,71 @@
+; Check the optimizer doesn't crash at inlining the function top and all of its callees are inlined.
+; RUN: opt < %s -O3 -S | FileCheck %s
+
+define dso_local void (...)* @second(i8** %p) {
+entry:
+  %p.addr = alloca i8**, align 8
+  store i8** %p, i8*** %p.addr, align 8
+  %tmp = load i8**, i8*** %p.addr, align 8
+  %tmp1 = load i8*, i8** %tmp, align 8
+  %tmp2 = bitcast i8* %tmp1 to void (...)*
+  ret void (...)* %tmp2
+}
+
+define dso_local void @top()  {
+entry:
+  ; CHECK: {{.*}} = {{.*}} call {{.*}} @ext
+  ; CHECK-NOT: {{.*}} = {{.*}} call {{.*}} @third
+  ; CHECK-NOT: {{.*}} = {{.*}} call {{.*}} @second
+  ; CHECK-NOT: {{.*}} = {{.*}} call {{.*}} @wrapper
+  %q = alloca i8*, align 8
+  store i8* bitcast (void ()* @third to i8*), i8** %q, align 8
+  %tmp = call void (...)* @second(i8** %q)
+  ; The call to 'wrapper' here is to ensure that its function attributes
+  ; i.e., returning its parameter and having no side effect, will be decuded
+  ; before the next round of inlining happens to 'top' to expose the bug.
+  %call =  call void (...)* @wrapper(void (...)* %tmp) 
+  ; The indirect call here is to confuse the alias analyzer so that
+  ; an incomplete graph will be built during the first round of inlining.
+  ; This allows the current function to be processed before the actual 
+  ; callee, i.e., the function 'run', is processed. Once it's simplified to 
+  ; a direct call, it also enables an additional round of inlining with all
+  ; function attributes deduced. 
+  call void (...) %call()
+  ret void
+}
+
+define dso_local void (...)* @gen() {
+entry:
+  %call = call void (...)* (...) @ext()
+  ret void (...)* %call
+}
+
+declare dso_local void (...)* @ext(...) 
+
+define dso_local void (...)* @wrapper(void (...)* %fn) {
+entry:
+  ret void (...)* %fn
+}
+
+define dso_local void @run(void (...)* %fn) {
+entry:
+  %fn.addr = alloca void (...)*, align 8
+  %f = alloca void (...)*, align 8
+  store void (...)* %fn, void (...)** %fn.addr, align 8
+  %tmp = load void (...)*, void (...)** %fn.addr, align 8
+  %call = call void (...)* @wrapper(void (...)* %tmp)
+  store void (...)* %call, void (...)** %f, align 8
+  %tmp1 = load void (...)*, void (...)** %f, align 8
+  call void (...) %tmp1()
+  ret void
+}
+
+define dso_local void @third() {
+entry:
+  %f = alloca void (...)*, align 8
+  %call = call void (...)* @gen()
+  store void (...)* %call, void (...)** %f, align 8
+  %tmp = load void (...)*, void (...)** %f, align 8
+  call void @run(void (...)* %tmp)
+  ret void
+}
\ No newline at end of file