[PATCH] D115497: [Inline] Disable deferred inlining

Fri Dec 10 09:49:58 PST 2021

davidxl added a comment.

First some high level comments:

1. There is a a conflict between maximize runtime performance and compile time, so this should not be disabled without extensive benchmarking from downstream users
2. It is OK to introduce an option to make it true (not changing behavior for now)
3. Some form of (limited) topdown inlining is beneficial, and there should be a general way to deal with runaway inlining

More inline comments below:

> After the switch to the new pass manager, we have observed multiple instances of catastrophic inlining, where the inliner produces huge functions with many hundreds of thousands of instructions from small input IR. We were forced to back out the switch to the new pass manager for this reason. This patch fixes at least one of the root cause issues.

Why does it only happen with the new PM?

> There are instances where we necessarily have to deviate from bottom-up inlining: When inlining in an SCC there is no natural "bottom", so inlining effectively happens top-down. This requires special care, and the inliner avoids exponential blowup by ensuring that functions in the SCC grow in a balanced way and will eventually hit the threshold.

This is not precise description: in the current bottom up inlining, after a calllsite gets inlined into to the caller, there will be an iterative inlining happening after that in top-down fashion. This has triggered compile time issue with recursive calls before.

The autoFDO's sample loader also has an top-down inlining pass.

> However, there is one instance where the inlining advisor explicitly violates the bottom-up principle: Deferred inlining tries to "defer" inlining a call if it determines that inlining the caller into all its call-sites would be more profitable. Something very important to understand about deferred inlining is that it doesn't make one inlining choice in place of another -- it effectively chooses to do both. If we have a call chain A -> B -> C and cost modelling tells us that inlining B -> C is profitable, but we defer this and instead inline A -> B first, then we'll now have a call A -> C, and the cost model will (a few special cases notwithstanding) still tell us that this is profitable. So the end result is that we inlined *both* B and C, even though under the usual cost model function B would have been too large to further inline after C has been integrated into it.

I think flattening A->B->C for most of the cases is working by design.

> Because deferred inlining violates the bottom-up invariant of the inliner, it can result in exponential inlining. The exponential-deferred-inlining.ll test case illustrates this on a simple example (see https://gist.github.com/nikic/1262b5f7d27278e1b34a190ae10947f5 for a much more catastrophic case with about 5000x size blowup). If the call chain A -> B -> C is not a chain but a tree of calls, then we end up deferring inlining across the tree and end up flattening everything into the root node.

This should be dealt with, but disabling the mechanism may be a too big a hammer without extensive testing,

> This patch proposes to address this by disabling deferred inlining entirely (currently still behind an option). Beyond the issue of exponential inlining, I don't think that the whole concept makes sense, at least as long as deferred inlining still ends up inlining both call edges.

I think introducing an option without changing behavior is the right first step to go.

> Now, unlike the normal inlining cost model, the deferred inlining cost model does look at all callers, and will extend the "last call to local" bonus if it determines that we could inline all of them as long as we defer the current inlining decision. This makes very little sense. The "last call to local" bonus doesn't really cost model anything. It's basically an "infinite" bonus that ensures we always inline the last call to a local. The fact that it's not literally infinite just prevents inlining of huge functions, which can easily result in scalability issues. I very much doubt that it was an intentional cost-modelling choice to say that getting rid of a small local function is worth adding 15000 instructions elsewhere, yet this is exactly how this value is getting used here.

The last call bonus of 15000 is certainly tunable. It models the DFE opportunity which should be callee size dependent.

> The main alternative I see to complete removal is to change deferred inlining to an actual either/or decision. That is, to mark deferred calls as noinline so we're actually trading off one inlining decision against another, and not just adding a side-channel to the cost model to do both.

No that is not the intention of the deferred inlining.

David

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115497/new/

https://reviews.llvm.org/D115497