[llvm-commits] [llvm] r153812 - in /llvm/trunk: include/llvm/Analysis/ include/llvm/Transforms/IPO/ lib/Analysis/ lib/Transforms/IPO/ test/Transforms/Inline/

Wed Apr 11 16:11:19 PDT 2012

On Wed, Apr 11, 2012 at 7:33 PM, Eli Friedman <eli.friedman at gmail.com>wrote:

> On Wed, Apr 11, 2012 at 9:09 AM, Chandler Carruth <chandlerc at gmail.com>
> wrote:
> > On Wed, Apr 11, 2012 at 1:56 AM, Chandler Carruth <chandlerc at gmail.com>
> > wrote:
> >>
> >> On Wed, Apr 11, 2012 at 12:31 AM, Chandler Carruth <chandlerc at gmail.com
> >
> >> wrote:
> >>>
> >>> I'll start looking for smoking guns right away though.
> >>
> >>
> >> This looks very much like the previous cases where inliner changes
> caused
> >> compile-time regressions.
> >>
> >> Looking at x86-64 of sqlite3, the profile with the trunk clang shows
> only
> >> 3.5% of all the time in the inline cost analysis. That's a bit higher
> than I
> >> would like (I've got some ideas to shrink it on two fronts that I will
> >> implement right away), it's not likely responsible for the near 10%
> >> regression your seeing; this function wasn't even free before.
> >>
> >> However, I'm seeing time spread pretty well between: JumpThreading, the
> >> RA, CorrelatedValueProp, GVN, and InstCombine. This looks like increased
> >> kicking in of the host of scalar optimizations giving us a broad slight
> >> slowdown.
> >>
> >> I'm still working on doing before/after profile comparisons and other
> >> things to see if I can tease out the culprit here.
> >>
> >> I also see several places where we can recoup a few percent in all
> >> likelihood; I'll try to tackle those if I can.
> >
> >
> > Ok, thanks to Chad for helping me get set up to look at this. I've
> > implemented the type of optimization that *should* help matters if the
> > inline cost computation were the problem. I've attached the patch. It
> > reduces the number of inline cost computations by 25% for sqlite3. I
> have a
> > plan for how to make even more invasive changes to the inliner that could
> > potentially save another 10% or so, but the alarming thing is that this
> > patch has *zero* impact on the -O3 compile time of the sqlite3 bitcode.
> =/
> > However, if I tweak the inline cost computation to simple return higher
> > costs, or to reject a larger percentage of the functions, I can
> immediately
> > recoup all 9% regressions and a lot more.
>
> If your patch consistently leads to lower computed inlining costs,
> perhaps we should lower the inlining threshold?

I don't think this is what is happening, but I would be interested if you
have test cases that exhibit it...

I should have given more context to the "tweak" i mentioned ... it wasn't
just lowering the threshold, it was just turning of certain computions and
reusing old ones which were higher.  Not a good measure.

When I first made these changes, I did a cursory check to see if the
thresholds were wildly off, and they did not seem to be wildly off.
Increasing the threshold gave very minor performance gains for significant
code size, Decreasing the threshold gave very significant code size
benefits but cost significant performance gains. There was no "clear
winner", so I think it will require the evaluation Jakob described to
re-calibrate these. I don't expect the changes in either code size or
efficiency to be dramatic except for -Os and -Oz.

What I think is happening here is quite different from "lower computed
inlining costs". I actually looked very carefully at costs before and after
the change, and they weren't consistently different in either direction. If
anything, they were slightly higher. What has changed is that when
significant simplifications can be made by inlining, we almost always catch
it. this leads to lower costs, but justifiably. It also has the obvious
tradeoff that we spend more time actually doing the simplifications*. The
cases where we see compile time dips but no runtime boosts are, IMO,
unlucky -- there were significant missed opportunities in that code, but
they happened to not be on the hot path.

-Chandler

[*]: Just to be clear, I looked quite a bit to see if we could share the
cost of the analysis for inline cost and the act of simplification after
inlining. I don't see very much at all to save there. The cost analysis
works hard to be clever and cheap by not doing full simplifications. I
think most of the "expensive" simplifications fall out of SROA, GVN,
JumpThreading, InstCombine. =/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120412/a05cd140/attachment.html>