[llvm-dev] (RFC) Adjusting default loop fully unroll threshold

Dehao Chen via llvm-dev llvm-dev at lists.llvm.org
Thu Feb 2 14:07:40 PST 2017


 Here is the code size impact for clang, chrome and 24 google internal
benchmarks (name omited, 14 15 16 are encoding/decoding benchmarks similar
as h264). There are 2 columns, for threshold 300 and 450 respectively.

I also tested the llvm test suite. Changing the threshold to 300/450 does
not affect code gen for any binary in the test suite.



300 450
clang 0.30% 0.63%
chrome 0.00% 0.00%
1 0.27% 0.67%
2 0.44% 0.93%
3 0.44% 0.93%
4 0.26% 0.53%
5 0.74% 2.21%
6 0.74% 2.21%
7 0.74% 2.21%
8 0.46% 1.05%
9 0.35% 0.86%
10 0.35% 0.86%
11 0.40% 0.83%
12 0.32% 0.65%
13 0.31% 0.64%
14 4.52% 8.23%
15 9.90% 19.38%
16 9.90% 19.38%
17 0.68% 1.97%
18 0.21% 0.48%
19 0.99% 3.44%
20 0.19% 0.46%
21 0.57% 1.62%
22 0.37% 1.05%
23 0.78% 1.30%
24 0.51% 1.54%

On Wed, Feb 1, 2017 at 6:08 PM, Mikhail Zolotukhin via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> On Feb 1, 2017, at 4:57 PM, Xinliang David Li via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> clang, chrome, and some internal large apps are good candidates for size
> metrics.
>
> I'd also add the standard LLVM testsuite just because it's the suite
> everyone in the community can use.
>
> Michael
>
>
> David
>
> On Wed, Feb 1, 2017 at 4:47 PM, Chandler Carruth via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> I had suggested having size metrics from somewhat larger applications
>> such as Chrome, Webkit, or Firefox; clang itself; and maybe some of our
>> internal binaries with rough size brackets?
>>
>> On Wed, Feb 1, 2017 at 4:33 PM Dehao Chen <dehao at google.com> wrote:
>>
>>> With the new data points, any comments on whether this can justify
>>> setting fully inline threshold to 300 (or any other number) in O2? I can
>>> collect more data points if it's helpful.
>>>
>>> Thanks,
>>> Dehao
>>>
>>> On Tue, Jan 31, 2017 at 3:20 PM, Dehao Chen <dehao at google.com> wrote:
>>>
>>> Recollected the data from trunk head with stddev data and more threshold
>>> data points attached:
>>>
>>> Performance:
>>>
>>> stddev/mean 300 450 600 750
>>> 403 0.37% 0.11% 0.11% 0.09% 0.79%
>>> 433 0.14% 0.51% 0.25% -0.63% -0.29%
>>> 445 0.08% 0.48% 0.89% 0.12% 0.83%
>>> 447 0.16% 3.50% 2.69% 3.66% 3.59%
>>> 453 0.11% 1.49% 0.45% -0.07% 0.78%
>>> 464 0.17% 0.75% 1.80% 1.86% 1.54%
>>> Code size:
>>>
>>> 300 450 600 750
>>> 403 0.56% 2.41% 2.74% 3.75%
>>> 433 0.96% 2.84% 4.19% 4.87%
>>> 445 2.16% 3.62% 4.48% 5.88%
>>> 447 2.96% 5.09% 6.74% 8.89%
>>> 453 0.94% 1.67% 2.73% 2.96%
>>> 464 8.02% 13.50% 20.51% 26.59%
>>> Compile time is proportional in the experiments and more noisy, so I did
>>> not include it.
>>>
>>> We have >2% speedup on some google internal benchmarks when switching
>>> the threshold from 150 to 300.
>>>
>>> Dehao
>>>
>>> On Mon, Jan 30, 2017 at 5:06 PM, Chandler Carruth <chandlerc at google.com>
>>> wrote:
>>>
>>> On Mon, Jan 30, 2017 at 4:59 PM Mehdi Amini <mehdi.amini at apple.com>
>>> wrote:
>>>
>>>
>>>
>>> Another question is about PGO integration: is it already hooked there?
>>> Should we have a more aggressive threshold in a hot function? (Assuming
>>> we’re willing to spend some binary size there but not on the cold path).
>>>
>>>
>>> I would even wire the *unrolling* the other way: just suppress unrolling
>>> in cold paths to save binary size. rolled loops seem like a generally good
>>> thing in cold code unless they are having some larger impact (IE, the loop
>>> itself is more expensive than the unrolled form).
>>>
>>>
>>>
>>> Agree that we could suppress unrolling in cold path to save code size.
>>> But that's orthogonal with the propose here. This proposal focuses on O2
>>> performance: shall we have different (higher) fully unroll threshold than
>>> dynamic/partial unroll.
>>>
>>>
>>> I agree that this is (to some extent) orthogonal, and it makes sense to
>>> me to differentiate the threshold for full unroll and the dynamic/partial
>>> case.
>>>
>>>
>>> There is one issue that makes these not orthogonal.
>>>
>>> If even *static* profile hints will reduce some of the code size
>>> increase caused by higher unrolling thresholds for non-cold code, we should
>>> factor that into the tradeoff in picking where the threshold goes.
>>>
>>> However, getting PGO into the full unroller is currently challenging
>>> outside of the new pass manager. We already have some unfortunate hacks
>>> around this in LoopUnswitch that are making the port of it to the new PM
>>> more annoying.
>>>
>>>
>>>
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170202/b3ea2ba5/attachment.html>


More information about the llvm-dev mailing list