[PATCH] D120584: [NewPM][Inliner] Mark inlined calls to functions in same SCC as callee as noinline

Sat Mar 5 11:53:33 PST 2022

wenlei added inline comments.

================
Comment at: llvm/lib/Transforms/IPO/Inliner.cpp:952
+              if (CalleeSCC != C &&
+                  CalleeSCC == CG.lookupSCC(CG.get(*NewCallee))) {
+                ICB->addFnAttr(Attribute::NoInline);
----------------
davidxl wrote:
> nikic wrote:
> > davidxl wrote:
> > > nikic wrote:
> > > > davidxl wrote:
> > > > > This feels like very strict. Is it possible to allow some level (under an option) of iterative inlining into the same SCC?     This is because not all cases of inlining into a non-trivial SCC will result in compile time issue due to existing threshold based stopping mechanism (i.e. the new callee in SCC is too large).
> > > > > 
> > > > > 
> > > > This does not restrict inlining into the same SCC. What it prevents is continuously peeling off more and more calls from a *different* non-trivial SCC.
> > > > 
> > > > From what I can tell, this is a rare case, only two programs in llvm-test-suite show any codegen changes with this patch.
> > > To clarify, my understanding of the scenario is like the following
> > > 
> > > 
> > > foo -->   (a, b, c)    where foo calls 'a'.
> > > 
> > > where a, b, and c forms anon trivial SCC.
> > > 
> > > Without the fix, we can end up with  inlining of foo->a, foo->b, foo->c, foo->a, foo->b, .... until foo blows up (we don't have a caller limit set in the inliner).
> > > 
> > > With this patch, once foo-> a is inlined, the new callsite foo->b is marked as no inline.
> > > 
> > > Is that the right?
> > Yeah, that sounds about right. I believe local inlining history will actually make sure that we only inline foo->a, foo->b, foo->c, which is what prevents this from simply inlining infinitely. But it still ends up flattening the SCC, which is particularly problematic if there are multi-edges in the SCC, in which case flattening is exponential.
> > 
> > D98481 forbids the foo->a inline in the first place, while this patch allows inlining foo->a, but forbids the foo->b inline. There are some cases where this kind of single-level inline is probably useful, e.g. if inlining at this call-site can prune away the recursive parts entirely.
> What I am thinking is a light weight mechanism to fix this with more flexibility. Assuming the number of callsites (from outside of SCC) into non-trivial SCC is small, a map from the <caller, non-trivialSCC> pairs to inlined count can be used to track the number of iterative inlining happening between the caller and SCC. If it exceeds the limit, mark the new callsite noInline.
This change is indeed better than D98481, but I'm still not sure if it is good enough from performance perspective.

I hear the argument about build speed and the earlier comments on strict bottom-up inlining, but different people have different balance between build speed and performance. Great if we can achieve both; otherwise, trying to minimize perf impact and giving user flexibility is probably better. 

Recursive inlining is weak for llvm even without this change, and this area is somewhat perf sensitive. We have observed perf gap between gcc and llvm due to less recursive inlining. We also have made changes to allow more recursive inlining with PGO outside of cgscc inliner where there's less constraints for this kind of issues. D109104 is example of more aggressive recursive inlining with PGO leading to visible perf movement even on spec.

We can try to measure this change with our internal benchmarks. But overall I agree with David that a solution that gives people flexibility would be better (either an optional inliner size/growth cap, or cross scc inline history counter like David suggested).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120584/new/

https://reviews.llvm.org/D120584