[llvm] r200886 - Inliner uses a smaller inline threshold for callees with cold attribute.

Wed Feb 5 17:02:04 PST 2014

On Wed, Feb 5, 2014 at 4:47 PM, Manman Ren <manman.ren at gmail.com> wrote:

>
>
>
> On Wed, Feb 5, 2014 at 4:34 PM, Chandler Carruth <chandlerc at google.com>wrote:
>
>>
>> On Wed, Feb 5, 2014 at 3:58 PM, Manman Ren <manman.ren at gmail.com> wrote:
>>
>>> I thought it is straight-forward to use a lower threshold for cold
>>> functions and didn't think this patch is going to have a wide & dramatic
>>> impact.
>>>
>>> I am not aware of any benchmark in lnt that uses cold attribute. If
>>> there is one, please let me know so I can collect some performance number.
>>>
>>
>> Surely there are benchmarks with more representative code than SPEC?
>> Maybe not. There is also no cold attribute in the test suite.
>>
>>
>>>
>>> When Diego first introduced the Cold attribute and used it to adjust
>>> block weights, there was no request on benchmark numbers. So I assumed
>>> negligible performance impact for this patch.
>>>
>>
>> Sure, but there wasn't any use of cold in LLVM before. Also, changing
>> inliner heuristics has an *incredibly* broader impact. It literally has the
>> largest impact in the entire compiler.
>>
>
> Agree.
>
>>
>> Notably, his change didn't introduce any particularly interesting
>> thresholds. It reused existing thresholds for __builtin_expect IIRC.
>>
>> There at least needs to be *some* science behind the use of 75 for the
>> magical threshold here. When setting the other thresholds both Jakob and I
>> would run large benchmark suites across a wide range of values and try to
>> dig out the shape of the graph.
>>
>
> When setting threshold for inlinehint, do you recall what benchmark suites
> we collected data on?
>

The threshold for inlinehint was set a very, *very* long time ago.

Both I (and my impression is Jakob, although this is second hand) validated
it after the inline cost analysis rewrite to ensure it was reasonable. My
conclusion was that it was essentially irrelevant. I have some hope of
removing it altogether. It makes very little sense for the inliner to
inline more aggressively due to hotness because we have no actually
measurement of the performance impact of inlining more or less at this
level.

On the flip side, we do *do* have some measurement of code size impact of
inlining more or less, and so there is I think a good justification for
reducing our inlining of cold function call sites in the same way we reduce
it to minimize code size growth.

One thing that would have been nice to discuss with the community before
committing this patch is whether this is even the right strategy. I don't
think that it is. I think that this is a poor bandaid for the fact that we
can't actually use the designated analysis passes that understand profile
information in the inliner. I'm OK with a temporary hack around this, but
a) I really would have liked to see this documented as a temporary hack in
the comments and in the commit log, and b) I worry that this hack by its
nature will do the Wrong Thing. Here is a concrete example.

If we have a cold function for logging errors, but we call it inside of the
hottest loop of the program, I think this change has at least some chance
of doing exactly the wrong thing. It may be much cheaper to inline the call
*but mark the code as cold* even if the function is quite large. Then the
code layout engine can sink the cold code out of the loop body. After
inlining, there may be much more freedom to isolate the impact of this cold
path from the body of the loop. Maybe a function call is the right
code-size-tradeoff, but it isn't clear to me. At a fundamental level, we
*can't* get this right by looking at function attributes. We need to look
at the BPI and BFI analyses that were designed to give information about
this type of thing.

Still, I'm OK with a bandaid if its reasonably isolated and doesn't happen
to cause any regressions for todays users. But this change wasn't clearly
marked as this, and it isn't clear that the community has had a real chance
to validate that it doesn't cause regressions (or that those regressions
are acceptable given the benefits).

75 is the threshold we use when optimizing for size (OptSizeThreshold). I
> used the same number for cold functions.
>

That wasn't commented or explained anywhere. If you're actively trying to
treat cold functions as-if they were optimizing for size, it would seem to
make more sense to use a boolean flag to enable that behavior and actually
use the OptSizeThreshold?

> Sure. I will set the default inlinecold-threshold to be the same as
> inline-threshold (225). About conducting experiments, without PGO setting
> cold attribute, we will need benchmarks that actually use "cold" attribute.
>

I don't know about other users of LLVM, but I have a reasonably large set
of internal benchmarks that definitely contain uses of the cold attribute.
It doesn't matter much how many functions use the cold attribute as how
many call sites. =]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140205/2888a182/attachment.html>