[LLVMdev] LLVM Inliner

Mon Nov 29 15:05:44 PST 2010

BTW, why not learn the heuristic automatically?..  It's actually not very
difficult. Last semester, for a class project, I did some experiments with
using statistical classifier boosting to automatically derive new heuristic
functions for LLVM's inliner (for -O2 and -Os).
The results are a bit biased since I've used a very small training program 
set (just SPEC 2006). I just made the report available on-line in case 
someone is interested: 
http://web.ist.utl.pt/nuno.lopes/pubs/inline-stats10.pdf

Nuno

----- Original Message -----
> On Nov 23, 2010, at 5:07 PM, Xinliang David Li wrote:
>> Hi, I browsed the LLVM inliner implementation, and it seems there is room 
>> for improvement.  (I have not read it too carefully, so correct me if 
>> what I observed is wrong).
>>
>> First the good side of the inliner -- the function level summary and 
>> inline cost estimation is more elaborate and complete than gcc. For 
>> instance, it considers callsite arguments and the effects of optimization 
>> enabled by inlining.
>
> Yep, as others pointed out, this is intended to interact closely with the 
> per-function optimizations that get mixed in due to the inliner being a 
> callgraphscc pass.  This is actually a really important property of the 
> inliner.  If you have a function foo that calls a leaf function bar, the 
> sequence of optimization is:
>
> 1. Run the inliner on bar (noop, since it has no call sites)
> 2. Run the per-function passes on bar.  This generally shrinks it, and 
> prevents "abstraction penalty" from making bar look too big to inline.
> 3. Run the inliner on foo.  Since foo calls bar, we consider inlining bar 
> into foo and do so if profitable.
> 4. Run the per-function passes on foo.  If bar got inlined, this means 
> that we're running the per-function passes over the inlined contents of 
> bar again.
>
> In a traditional optimizer like GCC's, you end up with problems where you 
> have to set a high inline threshold due to inlining-before-optimizing 
> causing "abstraction penalty problems".  An early inliner is a hack that 
> tries to address this.  Another problem with this approach from the 
> compile time perspective is that you end up repeating work multiple times. 
> For example, if there is a common subexpression in a small function, you 
> end up inlining it into many places, then having to eliminate the common 
> subexpression in each copy.
>
> The LLVM inliner avoids these problems, but (as you point out) this really 
> does force it to being a bottom-up inliner.  This means that the bottom-up 
> inliner needs to make decisions in strange ways in some cases: for example 
> if qux called foo, and foo were static, then (when processing foo) we may 
> decide not to inline bar into foo because it would be more profitable to 
> inline foo into qux.
>
>> Now more to the weakness of the inliner:
>>
>> 1) It is bottom up.  The inlining is not done in the order based on the 
>> priority of the callsites.  It may leave important callsites (at top of 
>> the cg) unlined due to higher cost after inline cost update. It also 
>> eliminates the possibility of inline specialization. To change this, the 
>> inliner pass may not use the pass manager infrastructure .  (I noticed a 
>> hack in the inliner to workaround the problem -- for static functions 
>> avoid inlining its callees if it causes it to become too big ..)
>
> This is true, but I don't think it's a really large problem in practice. 
> We don't have a "global inline threshold limit" (which I've never 
> understood, except as a hack to prevent run-away inlining) so not visiting 
> in priority order shouldn't prevent high-priority-but-processed-late 
> candidates from being inlined.
>
> The only potential issue I'm aware of is if we have A->B->C and we decide 
> to inline C into B when it would be more profitable to inline B into A and 
> leave C out of line.  This can be handled with a heuristic like the one 
> above.
>
>> 2) There seems to be only one inliner pass.  For calls to small 
>> functions, it is better to perform early inlining as one of the local 
>> (per function) optimizations followed by scalar opt clean up. This will 
>> sharpen the summary information.  (Note the inline summary update does 
>> not consider the possible cleanup)
>
> Yep.  This is a feature :)
>
>> 3)  recursive inlining is not supported
>
> This is a policy decision.  It's not clear whether it is really a good 
> idea, though I have seen some bugzilla or something about it.  I agree 
> that it should be revisited.
>
>> 4) function with indirect branch is not inlined. What source construct 
>> does indirect branch instr correspond to ? variable jump?
>
> See:
> http://blog.llvm.org/2010/01/address-of-label-and-indirect-branches.html
>
> for more details.
>
>> 6) There is one heuristc used in inline-cost computation seems wrong:
>>
>>   // Calls usually take a long time, so they make the inlining gain 
>> smaller.
>>   InlineCost += CalleeFI->Metrics.NumCalls * 
>> InlineConstants::CallPenalty;
>>
>> Does it try to block inlining of callees with lots of calls? Note 
>> inlining such a function only increase static call counts.
>
> I think that this is a heuristic that Jakob came up with, but I think it's 
> a good one, also discussed elsewhere on the thread.
>
> When talking about inlining and tuning thresholds and heuristics, it is a 
> good idea to quantify what the expected or possible wins of inlining a 
> function are.  Some of the ones I'm aware of:
>
> 1. In some cases, inlining shrinks code.
>
> 2. Inlining a function exposes optimization opportunities on the inlined 
> code, because constant propagation and other simplifications can take 
> place.
>
> 3. Inlining a function exposes optimizations in the caller because 
> address-taken values can be promoted to registers.
>
> 4. Inlining a function can improve optimization in a caller because 
> interprocedural side-effect analysis isn't needed.  For example, load/call 
> dependence may not be precise.  This is something we should continue to 
> improve in the optimizer though.
>
> 5. Inlining code with indirect call sites and switches can improve branch 
> prediction if some callers of the function are biased differently than 
> other callers.  This is pretty hard to predict without profile info 
> though.
>
>
> The "punish functions containing lots of calls" is based on the assumption 
> that functions which are mostly calls (again, this decision happens after 
> the callee has been inlined and simplified) aren't themselves doing much 
> work.
>
> -Chris