[LLVMdev] LLVM Inliner
Nuno Lopes
nunoplopes at sapo.pt
Mon Nov 29 15:05:44 PST 2010
BTW, why not learn the heuristic automatically?.. It's actually not very
difficult. Last semester, for a class project, I did some experiments with
using statistical classifier boosting to automatically derive new heuristic
functions for LLVM's inliner (for -O2 and -Os).
The results are a bit biased since I've used a very small training program
set (just SPEC 2006). I just made the report available on-line in case
someone is interested:
http://web.ist.utl.pt/nuno.lopes/pubs/inline-stats10.pdf
Nuno
----- Original Message -----
> On Nov 23, 2010, at 5:07 PM, Xinliang David Li wrote:
>> Hi, I browsed the LLVM inliner implementation, and it seems there is room
>> for improvement. (I have not read it too carefully, so correct me if
>> what I observed is wrong).
>>
>> First the good side of the inliner -- the function level summary and
>> inline cost estimation is more elaborate and complete than gcc. For
>> instance, it considers callsite arguments and the effects of optimization
>> enabled by inlining.
>
> Yep, as others pointed out, this is intended to interact closely with the
> per-function optimizations that get mixed in due to the inliner being a
> callgraphscc pass. This is actually a really important property of the
> inliner. If you have a function foo that calls a leaf function bar, the
> sequence of optimization is:
>
> 1. Run the inliner on bar (noop, since it has no call sites)
> 2. Run the per-function passes on bar. This generally shrinks it, and
> prevents "abstraction penalty" from making bar look too big to inline.
> 3. Run the inliner on foo. Since foo calls bar, we consider inlining bar
> into foo and do so if profitable.
> 4. Run the per-function passes on foo. If bar got inlined, this means
> that we're running the per-function passes over the inlined contents of
> bar again.
>
> In a traditional optimizer like GCC's, you end up with problems where you
> have to set a high inline threshold due to inlining-before-optimizing
> causing "abstraction penalty problems". An early inliner is a hack that
> tries to address this. Another problem with this approach from the
> compile time perspective is that you end up repeating work multiple times.
> For example, if there is a common subexpression in a small function, you
> end up inlining it into many places, then having to eliminate the common
> subexpression in each copy.
>
> The LLVM inliner avoids these problems, but (as you point out) this really
> does force it to being a bottom-up inliner. This means that the bottom-up
> inliner needs to make decisions in strange ways in some cases: for example
> if qux called foo, and foo were static, then (when processing foo) we may
> decide not to inline bar into foo because it would be more profitable to
> inline foo into qux.
>
>> Now more to the weakness of the inliner:
>>
>> 1) It is bottom up. The inlining is not done in the order based on the
>> priority of the callsites. It may leave important callsites (at top of
>> the cg) unlined due to higher cost after inline cost update. It also
>> eliminates the possibility of inline specialization. To change this, the
>> inliner pass may not use the pass manager infrastructure . (I noticed a
>> hack in the inliner to workaround the problem -- for static functions
>> avoid inlining its callees if it causes it to become too big ..)
>
> This is true, but I don't think it's a really large problem in practice.
> We don't have a "global inline threshold limit" (which I've never
> understood, except as a hack to prevent run-away inlining) so not visiting
> in priority order shouldn't prevent high-priority-but-processed-late
> candidates from being inlined.
>
> The only potential issue I'm aware of is if we have A->B->C and we decide
> to inline C into B when it would be more profitable to inline B into A and
> leave C out of line. This can be handled with a heuristic like the one
> above.
>
>> 2) There seems to be only one inliner pass. For calls to small
>> functions, it is better to perform early inlining as one of the local
>> (per function) optimizations followed by scalar opt clean up. This will
>> sharpen the summary information. (Note the inline summary update does
>> not consider the possible cleanup)
>
> Yep. This is a feature :)
>
>> 3) recursive inlining is not supported
>
> This is a policy decision. It's not clear whether it is really a good
> idea, though I have seen some bugzilla or something about it. I agree
> that it should be revisited.
>
>> 4) function with indirect branch is not inlined. What source construct
>> does indirect branch instr correspond to ? variable jump?
>
> See:
> http://blog.llvm.org/2010/01/address-of-label-and-indirect-branches.html
>
> for more details.
>
>> 6) There is one heuristc used in inline-cost computation seems wrong:
>>
>> // Calls usually take a long time, so they make the inlining gain
>> smaller.
>> InlineCost += CalleeFI->Metrics.NumCalls *
>> InlineConstants::CallPenalty;
>>
>> Does it try to block inlining of callees with lots of calls? Note
>> inlining such a function only increase static call counts.
>
> I think that this is a heuristic that Jakob came up with, but I think it's
> a good one, also discussed elsewhere on the thread.
>
> When talking about inlining and tuning thresholds and heuristics, it is a
> good idea to quantify what the expected or possible wins of inlining a
> function are. Some of the ones I'm aware of:
>
> 1. In some cases, inlining shrinks code.
>
> 2. Inlining a function exposes optimization opportunities on the inlined
> code, because constant propagation and other simplifications can take
> place.
>
> 3. Inlining a function exposes optimizations in the caller because
> address-taken values can be promoted to registers.
>
> 4. Inlining a function can improve optimization in a caller because
> interprocedural side-effect analysis isn't needed. For example, load/call
> dependence may not be precise. This is something we should continue to
> improve in the optimizer though.
>
> 5. Inlining code with indirect call sites and switches can improve branch
> prediction if some callers of the function are biased differently than
> other callers. This is pretty hard to predict without profile info
> though.
>
>
> The "punish functions containing lots of calls" is based on the assumption
> that functions which are mostly calls (again, this decision happens after
> the callee has been inlined and simplified) aren't themselves doing much
> work.
>
> -Chris
More information about the llvm-dev
mailing list