[llvm-commits] [llvm] r153506 - /llvm/trunk/lib/Transforms/IPO/Inliner.cpp

Matt Beaumont-Gay matthewbg at google.com
Tue Mar 27 09:55:38 PDT 2012


And the award for greatest ratio of change message to text changed goes to...

On Tue, Mar 27, 2012 at 03:48, Chandler Carruth <chandlerc at gmail.com> wrote:
> Author: chandlerc
> Date: Tue Mar 27 05:48:28 2012
> New Revision: 153506
>
> URL: http://llvm.org/viewvc/llvm-project?rev=153506&view=rev
> Log:
> Make a seemingly tiny change to the inliner and fix the generated code
> size bloat. Unfortunately, I expect this to disable the majority of the
> benefit from r152737. I'm hopeful at least that it will fix PR12345. To
> explain this requires... quite a bit of backstory I'm afraid.
>
> TL;DR: The change in r152737 actually did The Wrong Thing for
> linkonce-odr functions. This change makes it do the right thing. The
> benefits we saw were simple luck, not any actual strategy. Benchmark
> numbers after a mini-blog-post so that I've written down my thoughts on
> why all of this works and doesn't work...
>
> To understand what's going on here, you have to understand how the
> "bottom-up" inliner actually works. There are two fundamental modes to
> the inliner:
>
> 1) Standard fixed-cost bottom-up inlining. This is the mode we usually
>   think about. It walks from the bottom of the CFG up to the top,
>   looking at callsites, taking information about the callsite and the
>   called function and computing th expected cost of inlining into that
>   callsite. If the cost is under a fixed threshold, it inlines. It's
>   a touch more complicated than that due to all the bonuses, weights,
>   etc. Inlining the last callsite to an internal function gets higher
>   weighth, etc. But essentially, this is the mode of operation.
>
> 2) Deferred bottom-up inlining (a term I just made up). This is the
>   interesting mode for this patch an r152737. Initially, this works
>   just like mode #1, but once we have the cost of inlining into the
>   callsite, we don't just compare it with a fixed threshold. First, we
>   check something else. Let's give some names to the entities at this
>   point, or we'll end up hopelessly confused. We're considering
>   inlining a function 'A' into its callsite within a function 'B'. We
>   want to check whether 'B' has any callers, and whether it might be
>   inlined into those callers. If so, we also check whether inlining 'A'
>   into 'B' would block any of the opportunities for inlining 'B' into
>   its callers. We take the sum of the costs of inlining 'B' into its
>   callers where that inlining would be blocked by inlining 'A' into
>   'B', and if that cost is less than the cost of inlining 'A' into 'B',
>   then we skip inlining 'A' into 'B'.
>
> Now, in order for #2 to make sense, we have to have some confidence that
> we will actually have the opportunity to inline 'B' into its callers
> when cheaper, *and* that we'll be able to revisit the decision and
> inline 'A' into 'B' if that ever becomes the correct tradeoff. This
> often isn't true for external functions -- we can see very few of their
> callers, and we won't be able to re-consider inlining 'A' into 'B' if
> 'B' is external when we finally see more callers of 'B'. There are two
> cases where we believe this to be true for C/C++ code: functions local
> to a translation unit, and functions with an inline definition in every
> translation unit which uses them. These are represented as internal
> linkage and linkonce-odr (resp.) in LLVM. I enabled this logic for
> linkonce-odr in r152737.
>
> Unfortunately, when I did that, I also introduced a subtle bug. There
> was an implicit assumption that the last caller of the function within
> the TU was the last caller of the function in the program. We want to
> bonus the last caller of the function in the program by a huge amount
> for inlining because inlining that callsite has very little cost.
> Unfortunately, the last caller in the TU of a linkonce-odr function is
> *not* the last caller in the program, and so we don't want to apply this
> bonus. If we do, we can apply it to one callsite *per-TU*. Because of
> the way deferred inlining works, when it sees this bonus applied to one
> callsite in the TU for 'B', it decides that inlining 'B' is of the
> *utmost* importance just so we can get that final bonus. It then
> proceeds to essentially force deferred inlining regardless of the actual
> cost tradeoff.
>
> The result? PR12345: code bloat, code bloat, code bloat. Another result
> is getting *damn* lucky on a few benchmarks, and the over-inlining
> exposing critically important optimizations. I would very much like
> a list of benchmarks that regress after this change goes in, with
> bitcode before and after. This will help me greatly understand what
> opportunities the current cost analysis is missing.
>
> Initial benchmark numbers look very good. WebKit files that exhibited
> the worst of PR12345 went from growing to shrinking compared to Clang
> with r152737 reverted.
>
> - Bootstrapped Clang is 3% smaller with this change.
> - Bootstrapped Clang -O0 over a single-source-file of lib/Lex is 4%
>  faster with this change.
>
> Please let me know about any other performance impact you see. Thanks to
> Nico for reporting and urging me to actually fix, Richard Smith, Duncan
> Sands, Manuel Klimek, and Benjamin Kramer for talking through the issues
> today.
>
> Modified:
>    llvm/trunk/lib/Transforms/IPO/Inliner.cpp
>
> Modified: llvm/trunk/lib/Transforms/IPO/Inliner.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/IPO/Inliner.cpp?rev=153506&r1=153505&r2=153506&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Transforms/IPO/Inliner.cpp (original)
> +++ llvm/trunk/lib/Transforms/IPO/Inliner.cpp Tue Mar 27 05:48:28 2012
> @@ -260,7 +260,7 @@
>     int TotalSecondaryCost = 0;
>     bool outerCallsFound = false;
>     // This bool tracks what happens if we do NOT inline C into B.
> -    bool callerWillBeRemoved = true;
> +    bool callerWillBeRemoved = Caller->hasLocalLinkage();
>     // This bool tracks what happens if we DO inline C into B.
>     bool inliningPreventsSomeOuterInline = false;
>     for (Value::use_iterator I = Caller->use_begin(), E =Caller->use_end();
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits




More information about the llvm-commits mailing list