<div class="gmail_quote">On Wed, Apr 11, 2012 at 7:33 PM, Eli Friedman <span dir="ltr"><<a href="mailto:eli.friedman@gmail.com">eli.friedman@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="HOEnZb"><div class="h5">On Wed, Apr 11, 2012 at 9:09 AM, Chandler Carruth <<a href="mailto:chandlerc@gmail.com">chandlerc@gmail.com</a>> wrote:<br>

> On Wed, Apr 11, 2012 at 1:56 AM, Chandler Carruth <<a href="mailto:chandlerc@gmail.com">chandlerc@gmail.com</a>><br>

> wrote:<br>

>><br>

>> On Wed, Apr 11, 2012 at 12:31 AM, Chandler Carruth <<a href="mailto:chandlerc@gmail.com">chandlerc@gmail.com</a>><br>

>> wrote:<br>

>>><br>

>>> I'll start looking for smoking guns right away though.<br>

>><br>

>><br>

>> This looks very much like the previous cases where inliner changes caused<br>

>> compile-time regressions.<br>

>><br>

>> Looking at x86-64 of sqlite3, the profile with the trunk clang shows only<br>

>> 3.5% of all the time in the inline cost analysis. That's a bit higher than I<br>

>> would like (I've got some ideas to shrink it on two fronts that I will<br>

>> implement right away), it's not likely responsible for the near 10%<br>

>> regression your seeing; this function wasn't even free before.<br>

>><br>

>> However, I'm seeing time spread pretty well between: JumpThreading, the<br>

>> RA, CorrelatedValueProp, GVN, and InstCombine. This looks like increased<br>

>> kicking in of the host of scalar optimizations giving us a broad slight<br>

>> slowdown.<br>

>><br>

>> I'm still working on doing before/after profile comparisons and other<br>

>> things to see if I can tease out the culprit here.<br>

>><br>

>> I also see several places where we can recoup a few percent in all<br>

>> likelihood; I'll try to tackle those if I can.<br>

><br>

><br>

> Ok, thanks to Chad for helping me get set up to look at this. I've<br>

> implemented the type of optimization that *should* help matters if the<br>

> inline cost computation were the problem. I've attached the patch. It<br>

> reduces the number of inline cost computations by 25% for sqlite3. I have a<br>

> plan for how to make even more invasive changes to the inliner that could<br>

> potentially save another 10% or so, but the alarming thing is that this<br>

> patch has *zero* impact on the -O3 compile time of the sqlite3 bitcode. =/<br>

> However, if I tweak the inline cost computation to simple return higher<br>

> costs, or to reject a larger percentage of the functions, I can immediately<br>

> recoup all 9% regressions and a lot more.<br>

<br>

</div></div>If your patch consistently leads to lower computed inlining costs,<br>

perhaps we should lower the inlining threshold?</blockquote><div><br></div><div>I don't think this is what is happening, but I would be interested if you have test cases that exhibit it...</div><div><br></div><div>I should have given more context to the "tweak" i mentioned ... it wasn't just lowering the threshold, it was just turning of certain computions and reusing old ones which were higher.  Not a good measure.</div>

<div><br></div><div>When I first made these changes, I did a cursory check to see if the thresholds were wildly off, and they did not seem to be wildly off. Increasing the threshold gave very minor performance gains for significant code size, Decreasing the threshold gave very significant code size benefits but cost significant performance gains. There was no "clear winner", so I think it will require the evaluation Jakob described to re-calibrate these. I don't expect the changes in either code size or efficiency to be dramatic except for -Os and -Oz.</div>

<div><br></div><div>What I think is happening here is quite different from "lower computed inlining costs". I actually looked very carefully at costs before and after the change, and they weren't consistently different in either direction. If anything, they were slightly higher. What has changed is that when significant simplifications can be made by inlining, we almost always catch it. this leads to lower costs, but justifiably. It also has the obvious tradeoff that we spend more time actually doing the simplifications*. The cases where we see compile time dips but no runtime boosts are, IMO, unlucky -- there were significant missed opportunities in that code, but they happened to not be on the hot path.</div>

<div><br></div><div>-Chandler</div><div><br></div><div>[*]: Just to be clear, I looked quite a bit to see if we could share the cost of the analysis for inline cost and the act of simplification after inlining. I don't see very much at all to save there. The cost analysis works hard to be clever and cheap by not doing full simplifications. I think most of the "expensive" simplifications fall out of SROA, GVN, JumpThreading, InstCombine. =/</div>

</div>