[llvm-commits] [llvm] r153812 - in /llvm/trunk: include/llvm/Analysis/ include/llvm/Transforms/IPO/ lib/Analysis/ lib/Transforms/IPO/ test/Transforms/Inline/

Wed Apr 11 09:09:05 PDT 2012

On Wed, Apr 11, 2012 at 1:56 AM, Chandler Carruth <chandlerc at gmail.com>wrote:

> On Wed, Apr 11, 2012 at 12:31 AM, Chandler Carruth <chandlerc at gmail.com>wrote:
>
>> I'll start looking for smoking guns right away though.
>
>
> This looks very much like the previous cases where inliner changes caused
> compile-time regressions.
>
> Looking at x86-64 of sqlite3, the profile with the trunk clang shows only
> 3.5% of all the time in the inline cost analysis. That's a bit higher than
> I would like (I've got some ideas to shrink it on two fronts that I will
> implement right away), it's not likely responsible for the near 10%
> regression your seeing; this function wasn't even free before.
>
> However, I'm seeing time spread pretty well between: JumpThreading, the
> RA, CorrelatedValueProp, GVN, and InstCombine. This looks like increased
> kicking in of the host of scalar optimizations giving us a broad slight
> slowdown.
>
> I'm still working on doing before/after profile comparisons and other
> things to see if I can tease out the culprit here.
>
> I also see several places where we can recoup a few percent in all
> likelihood; I'll try to tackle those if I can.
>

Ok, thanks to Chad for helping me get set up to look at this. I've
implemented the type of optimization that *should* help matters if the
inline cost computation were the problem. I've attached the patch. It
reduces the number of inline cost computations by 25% for sqlite3. I have a
plan for how to make even more invasive changes to the inliner that could
potentially save another 10% or so, but the alarming thing is that this
patch has *zero* impact on the -O3 compile time of the sqlite3 bitcode. =/
However, if I tweak the inline cost computation to simple return higher
costs, or to reject a larger percentage of the functions, I can immediately
recoup all 9% regressions and a lot more.

As far as I can tell, this is a symptom of the new inline cost metric
exposing more (good) inlining opportunities, and the scalar optimizations
in LLVM taking advantage of them, and chewing on the code more. The
unfortunate thing is that we're not getting any significant runtime
improvements out of this (or are we?).

I think the only real solution is to work on making the various scalar
optimizations less expensive. I see a few opportunities for this already
after staring at the profile for a while. I'll try to look into those as I
have time. =/ I wish I had a better answer here. Other ideas? Thoughts?

I've attached the patch which caches some inline cost queries. As I said,
it caches about 25% of them, at least on this test case. Even so, I'm not
sure we should do it because it adds complexity and ugliness to the code.
Let me know.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120411/ef41390b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cache-callercallers.diff
Type: application/octet-stream
Size: 5162 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120411/ef41390b/attachment.obj>