[llvm-dev] (RFC) Adjusting default loop fully unroll threshold
Xinliang David Li via llvm-dev
llvm-dev at lists.llvm.org
Wed Feb 1 16:57:45 PST 2017
clang, chrome, and some internal large apps are good candidates for size
On Wed, Feb 1, 2017 at 4:47 PM, Chandler Carruth via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> I had suggested having size metrics from somewhat larger applications such
> as Chrome, Webkit, or Firefox; clang itself; and maybe some of our internal
> binaries with rough size brackets?
> On Wed, Feb 1, 2017 at 4:33 PM Dehao Chen <dehao at google.com> wrote:
>> With the new data points, any comments on whether this can justify
>> setting fully inline threshold to 300 (or any other number) in O2? I can
>> collect more data points if it's helpful.
>> On Tue, Jan 31, 2017 at 3:20 PM, Dehao Chen <dehao at google.com> wrote:
>> Recollected the data from trunk head with stddev data and more threshold
>> data points attached:
>> stddev/mean 300 450 600 750
>> 403 0.37% 0.11% 0.11% 0.09% 0.79%
>> 433 0.14% 0.51% 0.25% -0.63% -0.29%
>> 445 0.08% 0.48% 0.89% 0.12% 0.83%
>> 447 0.16% 3.50% 2.69% 3.66% 3.59%
>> 453 0.11% 1.49% 0.45% -0.07% 0.78%
>> 464 0.17% 0.75% 1.80% 1.86% 1.54%
>> Code size:
>> 300 450 600 750
>> 403 0.56% 2.41% 2.74% 3.75%
>> 433 0.96% 2.84% 4.19% 4.87%
>> 445 2.16% 3.62% 4.48% 5.88%
>> 447 2.96% 5.09% 6.74% 8.89%
>> 453 0.94% 1.67% 2.73% 2.96%
>> 464 8.02% 13.50% 20.51% 26.59%
>> Compile time is proportional in the experiments and more noisy, so I did
>> not include it.
>> We have >2% speedup on some google internal benchmarks when switching the
>> threshold from 150 to 300.
>> On Mon, Jan 30, 2017 at 5:06 PM, Chandler Carruth <chandlerc at google.com>
>> On Mon, Jan 30, 2017 at 4:59 PM Mehdi Amini <mehdi.amini at apple.com>
>> Another question is about PGO integration: is it already hooked there?
>> Should we have a more aggressive threshold in a hot function? (Assuming
>> we’re willing to spend some binary size there but not on the cold path).
>> I would even wire the *unrolling* the other way: just suppress unrolling
>> in cold paths to save binary size. rolled loops seem like a generally good
>> thing in cold code unless they are having some larger impact (IE, the loop
>> itself is more expensive than the unrolled form).
>> Agree that we could suppress unrolling in cold path to save code size.
>> But that's orthogonal with the propose here. This proposal focuses on O2
>> performance: shall we have different (higher) fully unroll threshold than
>> dynamic/partial unroll.
>> I agree that this is (to some extent) orthogonal, and it makes sense to
>> me to differentiate the threshold for full unroll and the dynamic/partial
>> There is one issue that makes these not orthogonal.
>> If even *static* profile hints will reduce some of the code size increase
>> caused by higher unrolling thresholds for non-cold code, we should factor
>> that into the tradeoff in picking where the threshold goes.
>> However, getting PGO into the full unroller is currently challenging
>> outside of the new pass manager. We already have some unfortunate hacks
>> around this in LoopUnswitch that are making the port of it to the new PM
>> more annoying.
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev