[llvm-dev] (RFC) Adjusting default loop fully unroll threshold

Tue Feb 7 22:24:35 PST 2017

On 02/07/2017 05:29 PM, Sanjay Patel via llvm-dev wrote:
> Sorry if I missed it, but what machine/CPU are you using to collect 
> the perf numbers?
>
> I am concerned that what may be a win on a CPU that keeps a couple of 
> hundred instructions in-flight and has many MB of caches will not hold 
> for a small core.

In my experience, unrolling tends to help weaker cores even more than 
stronger ones because it allows the instruction scheduler more 
opportunities to hide latency. Obviously, instruction-cache pressure is 
an important consideration, but the code size changes here seems small.

>
> Is the proposed change universal? Is there a way to undo it?

All of the unrolling thresholds should be target-adjustable using the 
TTI::getUnrollingPreferences hook.

  -Hal

>
> On Tue, Feb 7, 2017 at 3:26 PM, Dehao Chen via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>     Ping... with the updated code size impact data, any more comments?
>     Any more data that would be interesting to collect?
>
>     Thanks,
>     Dehao
>
>     On Thu, Feb 2, 2017 at 2:07 PM, Dehao Chen <dehao at google.com
>     <mailto:dehao at google.com>> wrote:
>
>         Here is the code size impact for clang, chrome and 24 google
>         internal benchmarks (name omited, 14 15 16 are
>         encoding/decoding benchmarks similar as h264). There are 2
>         columns, for threshold 300 and 450 respectively.
>
>         I also tested the llvm test suite. Changing the threshold to
>         300/450 does not affect code gen for any binary in the test suite.
>
>
>
>         	300 	450
>         clang 	0.30% 	0.63%
>         chrome 	0.00% 	0.00%
>         1 	0.27% 	0.67%
>         2 	0.44% 	0.93%
>         3 	0.44% 	0.93%
>         4 	0.26% 	0.53%
>         5 	0.74% 	2.21%
>         6 	0.74% 	2.21%
>         7 	0.74% 	2.21%
>         8 	0.46% 	1.05%
>         9 	0.35% 	0.86%
>         10 	0.35% 	0.86%
>         11 	0.40% 	0.83%
>         12 	0.32% 	0.65%
>         13 	0.31% 	0.64%
>         14 	4.52% 	8.23%
>         15 	9.90% 	19.38%
>         16 	9.90% 	19.38%
>         17 	0.68% 	1.97%
>         18 	0.21% 	0.48%
>         19 	0.99% 	3.44%
>         20 	0.19% 	0.46%
>         21 	0.57% 	1.62%
>         22 	0.37% 	1.05%
>         23 	0.78% 	1.30%
>         24 	0.51% 	1.54%
>
>
>         On Wed, Feb 1, 2017 at 6:08 PM, Mikhail Zolotukhin via
>         llvm-dev <llvm-dev at lists.llvm.org
>         <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>>             On Feb 1, 2017, at 4:57 PM, Xinliang David Li via
>>             llvm-dev <llvm-dev at lists.llvm.org
>>             <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>>             clang, chrome, and some internal large apps are good
>>             candidates for size metrics.
>             I'd also add the standard LLVM testsuite just because it's
>             the suite everyone in the community can use.
>
>             Michael
>>
>>             David
>>
>>             On Wed, Feb 1, 2017 at 4:47 PM, Chandler Carruth via
>>             llvm-dev <llvm-dev at lists.llvm.org
>>             <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>>                 I had suggested having size metrics from somewhat
>>                 larger applications such as Chrome, Webkit, or
>>                 Firefox; clang itself; and maybe some of our internal
>>                 binaries with rough size brackets?
>>
>>                 On Wed, Feb 1, 2017 at 4:33 PM Dehao Chen
>>                 <dehao at google.com <mailto:dehao at google.com>> wrote:
>>
>>                     With the new data points, any comments on whether
>>                     this can justify setting fully inline threshold
>>                     to 300 (or any other number) in O2? I can collect
>>                     more data points if it's helpful.
>>
>>                     Thanks,
>>                     Dehao
>>
>>                     On Tue, Jan 31, 2017 at 3:20 PM, Dehao Chen
>>                     <dehao at google.com <mailto:dehao at google.com>> wrote:
>>
>>                         Recollected the data from trunk head with
>>                         stddev data and more threshold data points
>>                         attached:
>>
>>                         Performance:
>>
>>                         	stddev/mean 	300 	450 	600 	750
>>                         403 	0.37% 	0.11% 	0.11% 	0.09% 	0.79%
>>                         433 	0.14% 	0.51% 	0.25% 	-0.63% 	-0.29%
>>                         445 	0.08% 	0.48% 	0.89% 	0.12% 	0.83%
>>                         447 	0.16% 	3.50% 	2.69% 	3.66% 	3.59%
>>                         453 	0.11% 	1.49% 	0.45% 	-0.07% 	0.78%
>>                         464 	0.17% 	0.75% 	1.80% 	1.86% 	1.54%
>>
>>
>>                         Code size:
>>
>>                         	300 	450 	600 	750
>>                         403 	0.56% 	2.41% 	2.74% 	3.75%
>>                         433 	0.96% 	2.84% 	4.19% 	4.87%
>>                         445 	2.16% 	3.62% 	4.48% 	5.88%
>>                         447 	2.96% 	5.09% 	6.74% 	8.89%
>>                         453 	0.94% 	1.67% 	2.73% 	2.96%
>>                         464 	8.02% 	13.50% 	20.51% 	26.59%
>>
>>
>>                         Compile time is proportional in the
>>                         experiments and more noisy, so I did not
>>                         include it.
>>
>>                         We have >2% speedup on some google internal
>>                         benchmarks when switching the threshold from
>>                         150 to 300.
>>
>>                         Dehao
>>
>>                         On Mon, Jan 30, 2017 at 5:06 PM, Chandler
>>                         Carruth <chandlerc at google.com
>>                         <mailto:chandlerc at google.com>> wrote:
>>
>>                             On Mon, Jan 30, 2017 at 4:59 PM Mehdi
>>                             Amini <mehdi.amini at apple.com
>>                             <mailto:mehdi.amini at apple.com>> wrote:
>>
>>>
>>>
>>>                                         Another question is about
>>>                                         PGO integration: is it
>>>                                         already hooked there? Should
>>>                                         we have a more aggressive
>>>                                         threshold in a hot function?
>>>                                         (Assuming we’re willing to
>>>                                         spend some binary size there
>>>                                         but not on the cold path).
>>>
>>>
>>>                                     I would even wire the
>>>                                     *unrolling* the other way: just
>>>                                     suppress unrolling in cold paths
>>>                                     to save binary size. rolled
>>>                                     loops seem like a generally good
>>>                                     thing in cold code unless they
>>>                                     are having some larger impact
>>>                                     (IE, the loop itself is more
>>>                                     expensive than the unrolled form).
>>>
>>>
>>>
>>>                                 Agree that we could suppress
>>>                                 unrolling in cold path to save code
>>>                                 size. But that's orthogonal with the
>>>                                 propose here. This proposal focuses
>>>                                 on O2 performance: shall we have
>>>                                 different (higher) fully unroll
>>>                                 threshold than dynamic/partial unroll.
>>
>>                                 I agree that this is (to some extent)
>>                                 orthogonal, and it makes sense to me
>>                                 to differentiate the threshold for
>>                                 full unroll and the dynamic/partial case.
>>
>>
>>                             There is one issue that makes these not
>>                             orthogonal.
>>
>>                             If even *static* profile hints will
>>                             reduce some of the code size increase
>>                             caused by higher unrolling thresholds for
>>                             non-cold code, we should factor that into
>>                             the tradeoff in picking where the
>>                             threshold goes.
>>
>>                             However, getting PGO into the full
>>                             unroller is currently challenging outside
>>                             of the new pass manager. We already have
>>                             some unfortunate hacks around this in
>>                             LoopUnswitch that are making the port of
>>                             it to the new PM more annoying.
>>
>>
>>
>>
>>                 _______________________________________________
>>                 LLVM Developers mailing list
>>                 llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>                 http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>                 <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>
>>
>>             _______________________________________________
>>             LLVM Developers mailing list
>>             llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>             http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>             <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
>             _______________________________________________
>             LLVM Developers mailing list
>             llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>             http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>             <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
>
>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170208/0f7dd34d/attachment-0001.html>