[llvm] r200886 - Inliner uses a smaller inline threshold for callees with cold attribute.

Wed Feb 5 18:15:14 PST 2014

On Wed, Feb 5, 2014 at 5:02 PM, Chandler Carruth <chandlerc at google.com>wrote:

> On Wed, Feb 5, 2014 at 4:47 PM, Manman Ren <manman.ren at gmail.com> wrote:
>
>>
>>
>>
>> On Wed, Feb 5, 2014 at 4:34 PM, Chandler Carruth <chandlerc at google.com>wrote:
>>
>>>
>>> On Wed, Feb 5, 2014 at 3:58 PM, Manman Ren <manman.ren at gmail.com> wrote:
>>>
>>>> I thought it is straight-forward to use a lower threshold for cold
>>>> functions and didn't think this patch is going to have a wide & dramatic
>>>> impact.
>>>>
>>>> I am not aware of any benchmark in lnt that uses cold attribute. If
>>>> there is one, please let me know so I can collect some performance number.
>>>>
>>>
>>> Surely there are benchmarks with more representative code than SPEC?
>>> Maybe not. There is also no cold attribute in the test suite.
>>>
>>>
>>>>
>>>> When Diego first introduced the Cold attribute and used it to adjust
>>>> block weights, there was no request on benchmark numbers. So I assumed
>>>> negligible performance impact for this patch.
>>>>
>>>
>>> Sure, but there wasn't any use of cold in LLVM before. Also, changing
>>> inliner heuristics has an *incredibly* broader impact. It literally has the
>>> largest impact in the entire compiler.
>>>
>>
>> Agree.
>>
>>>
>>> Notably, his change didn't introduce any particularly interesting
>>> thresholds. It reused existing thresholds for __builtin_expect IIRC.
>>>
>>> There at least needs to be *some* science behind the use of 75 for the
>>> magical threshold here. When setting the other thresholds both Jakob and I
>>> would run large benchmark suites across a wide range of values and try to
>>> dig out the shape of the graph.
>>>
>>
>> When setting threshold for inlinehint, do you recall what benchmark
>> suites we collected data on?
>>
>
> The threshold for inlinehint was set a very, *very* long time ago.
>
> Both I (and my impression is Jakob, although this is second hand)
> validated it after the inline cost analysis rewrite to ensure it was
> reasonable. My conclusion was that it was essentially irrelevant. I have
> some hope of removing it altogether. It makes very little sense for the
> inliner to inline more aggressively due to hotness because we have no
> actually measurement of the performance impact of inlining more or less at
> this level.
>
> On the flip side, we do *do* have some measurement of code size impact of
> inlining more or less, and so there is I think a good justification for
> reducing our inlining of cold function call sites in the same way we reduce
> it to minimize code size growth.
>
>
> One thing that would have been nice to discuss with the community before
> committing this patch is whether this is even the right strategy. I don't
> think that it is. I think that this is a poor bandaid for the fact that we
> can't actually use the designated analysis passes that understand profile
> information in the inliner. I'm OK with a temporary hack around this, but
> a) I really would have liked to see this documented as a temporary hack in
> the comments and in the commit log,
>
I committed r200898 to have the same default behavior as prior-to 200886
and explained the purpose of this new threshold in comments and commit log.

> and b) I worry that this hack by its nature will do the Wrong Thing. Here
> is a concrete example.
>

> If we have a cold function for logging errors, but we call it inside of
> the hottest loop of the program, I think this change has at least some
> chance of doing exactly the wrong thing. It may be much cheaper to inline
> the call *but mark the code as cold* even if the function is quite large.
> Then the code layout engine can sink the cold code out of the loop body.
> After inlining, there may be much more freedom to isolate the impact of
> this cold path from the body of the loop. Maybe a function call is the
> right code-size-tradeoff, but it isn't clear to me.
>
Agree. It is not a clear win in performance, but may reduce code size.

> At a fundamental level, we *can't* get this right by looking at function
> attributes. We need to look at the BPI and BFI analyses that were designed
> to give information about this type of thing.
>
Yes, since the inliner is not hooked up to BPI & BFI yet, I introduced this
new threshold to help performance of PGO. But it may help applications with
cold attribute (that depends on our later-on validation).

>
> Still, I'm OK with a bandaid if its reasonably isolated and doesn't happen
> to cause any regressions for todays users.
>
With r200898, we should not cause any regressions for today's users.

> But this change wasn't clearly marked as this, and it isn't clear that the
> community has had a real chance to validate that it doesn't cause
> regressions (or that those regressions are acceptable given the benefits).
>
> 75 is the threshold we use when optimizing for size (OptSizeThreshold). I
>> used the same number for cold functions.
>>
>
> That wasn't commented or explained anywhere. If you're actively trying to
> treat cold functions as-if they were optimizing for size, it would seem to
> make more sense to use a boolean flag to enable that behavior and actually
> use the OptSizeThreshold?
>
A threshold gives more freedom in tuning performance =]

Thanks,
Manman

>
>
>> Sure. I will set the default inlinecold-threshold to be the same as
>> inline-threshold (225). About conducting experiments, without PGO setting
>> cold attribute, we will need benchmarks that actually use "cold" attribute.
>>
>
> I don't know about other users of LLVM, but I have a reasonably large set
> of internal benchmarks that definitely contain uses of the cold attribute.
> It doesn't matter much how many functions use the cold attribute as how
> many call sites. =]
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140205/fc72747c/attachment.html>