[llvm-dev] (RFC) Adjusting default loop fully unroll threshold

Wed Feb 1 16:33:47 PST 2017

With the new data points, any comments on whether this can justify setting
fully inline threshold to 300 (or any other number) in O2? I can collect
more data points if it's helpful.

Thanks,
Dehao

On Tue, Jan 31, 2017 at 3:20 PM, Dehao Chen <dehao at google.com> wrote:

> Recollected the data from trunk head with stddev data and more threshold
> data points attached:
>
> Performance:
>
> stddev/mean 300 450 600 750
> 403 0.37% 0.11% 0.11% 0.09% 0.79%
> 433 0.14% 0.51% 0.25% -0.63% -0.29%
> 445 0.08% 0.48% 0.89% 0.12% 0.83%
> 447 0.16% 3.50% 2.69% 3.66% 3.59%
> 453 0.11% 1.49% 0.45% -0.07% 0.78%
> 464 0.17% 0.75% 1.80% 1.86% 1.54%
> Code size:
>
> 300 450 600 750
> 403 0.56% 2.41% 2.74% 3.75%
> 433 0.96% 2.84% 4.19% 4.87%
> 445 2.16% 3.62% 4.48% 5.88%
> 447 2.96% 5.09% 6.74% 8.89%
> 453 0.94% 1.67% 2.73% 2.96%
> 464 8.02% 13.50% 20.51% 26.59%
> Compile time is proportional in the experiments and more noisy, so I did
> not include it.
>
> We have >2% speedup on some google internal benchmarks when switching the
> threshold from 150 to 300.
>
> Dehao
>
> On Mon, Jan 30, 2017 at 5:06 PM, Chandler Carruth <chandlerc at google.com>
> wrote:
>
>> On Mon, Jan 30, 2017 at 4:59 PM Mehdi Amini <mehdi.amini at apple.com>
>> wrote:
>>
>>>
>>>
>>> Another question is about PGO integration: is it already hooked there?
>>> Should we have a more aggressive threshold in a hot function? (Assuming
>>> we’re willing to spend some binary size there but not on the cold path).
>>>
>>>
>>> I would even wire the *unrolling* the other way: just suppress unrolling
>>> in cold paths to save binary size. rolled loops seem like a generally good
>>> thing in cold code unless they are having some larger impact (IE, the loop
>>> itself is more expensive than the unrolled form).
>>>
>>>
>>>
>>> Agree that we could suppress unrolling in cold path to save code size.
>>> But that's orthogonal with the propose here. This proposal focuses on O2
>>> performance: shall we have different (higher) fully unroll threshold than
>>> dynamic/partial unroll.
>>>
>>>
>>> I agree that this is (to some extent) orthogonal, and it makes sense to
>>> me to differentiate the threshold for full unroll and the dynamic/partial
>>> case.
>>>
>>
>> There is one issue that makes these not orthogonal.
>>
>> If even *static* profile hints will reduce some of the code size increase
>> caused by higher unrolling thresholds for non-cold code, we should factor
>> that into the tradeoff in picking where the threshold goes.
>>
>> However, getting PGO into the full unroller is currently challenging
>> outside of the new pass manager. We already have some unfortunate hacks
>> around this in LoopUnswitch that are making the port of it to the new PM
>> more annoying.
>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170201/f1efba7f/attachment.html>