[llvm-dev] (RFC) Adjusting default loop fully unroll threshold

Tue Jan 31 15:20:52 PST 2017

Recollected the data from trunk head with stddev data and more threshold
data points attached:

Performance:

stddev/mean 300 450 600 750
403 0.37% 0.11% 0.11% 0.09% 0.79%
433 0.14% 0.51% 0.25% -0.63% -0.29%
445 0.08% 0.48% 0.89% 0.12% 0.83%
447 0.16% 3.50% 2.69% 3.66% 3.59%
453 0.11% 1.49% 0.45% -0.07% 0.78%
464 0.17% 0.75% 1.80% 1.86% 1.54%
Code size:

300 450 600 750
403 0.56% 2.41% 2.74% 3.75%
433 0.96% 2.84% 4.19% 4.87%
445 2.16% 3.62% 4.48% 5.88%
447 2.96% 5.09% 6.74% 8.89%
453 0.94% 1.67% 2.73% 2.96%
464 8.02% 13.50% 20.51% 26.59%
Compile time is proportional in the experiments and more noisy, so I did
not include it.

We have >2% speedup on some google internal benchmarks when switching the
threshold from 150 to 300.

Dehao

On Mon, Jan 30, 2017 at 5:06 PM, Chandler Carruth <chandlerc at google.com>
wrote:

> On Mon, Jan 30, 2017 at 4:59 PM Mehdi Amini <mehdi.amini at apple.com> wrote:
>
>>
>>
>> Another question is about PGO integration: is it already hooked there?
>> Should we have a more aggressive threshold in a hot function? (Assuming
>> we’re willing to spend some binary size there but not on the cold path).
>>
>>
>> I would even wire the *unrolling* the other way: just suppress unrolling
>> in cold paths to save binary size. rolled loops seem like a generally good
>> thing in cold code unless they are having some larger impact (IE, the loop
>> itself is more expensive than the unrolled form).
>>
>>
>>
>> Agree that we could suppress unrolling in cold path to save code size.
>> But that's orthogonal with the propose here. This proposal focuses on O2
>> performance: shall we have different (higher) fully unroll threshold than
>> dynamic/partial unroll.
>>
>>
>> I agree that this is (to some extent) orthogonal, and it makes sense to
>> me to differentiate the threshold for full unroll and the dynamic/partial
>> case.
>>
>
> There is one issue that makes these not orthogonal.
>
> If even *static* profile hints will reduce some of the code size increase
> caused by higher unrolling thresholds for non-cold code, we should factor
> that into the tradeoff in picking where the threshold goes.
>
> However, getting PGO into the full unroller is currently challenging
> outside of the new pass manager. We already have some unfortunate hacks
> around this in LoopUnswitch that are making the port of it to the new PM
> more annoying.
>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170131/3299bb23/attachment.html>