[llvm-dev] (RFC) Adjusting default loop fully unroll threshold

Mon Feb 13 08:29:36 PST 2017

Thanks for the comment. The performance experiments were performed on Intel
Sandybridge. Updated this info to the patch description.

Dehao

On Sun, Feb 12, 2017 at 8:24 AM, Sanjay Patel <spatel at rotateright.com>
wrote:

> Since we can override the settings, I have no objections.
>
> I still think it would be good to document here and in the review/commit
> message which CPU model was used to acquire the experimental data. That
> could be useful to anyone that comes along later and wants to reproduce
> and/or compare to the original, motivating data.
>
> On Fri, Feb 10, 2017 at 4:53 PM, Dehao Chen <dehao at google.com> wrote:
>
>> Thanks Hal, could you help approve https://reviews.llvm.org/D28368?
>>
>> I'll hold off until early Tuesday in case other people have more concerns.
>>
>> Thanks,
>> Dehao
>>
>> On Fri, Feb 10, 2017 at 3:23 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>
>>>
>>> On 02/10/2017 05:21 PM, Dehao Chen wrote:
>>>
>>> Thanks every for the comments.
>>>
>>> Do we have a decision here?
>>>
>>>
>>> You're good to go as far as I'm concerned.
>>>
>>>  -Hal
>>>
>>>
>>> Dehao
>>>
>>> On Tue, Feb 7, 2017 at 10:24 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>>
>>>>
>>>> On 02/07/2017 05:29 PM, Sanjay Patel via llvm-dev wrote:
>>>>
>>>> Sorry if I missed it, but what machine/CPU are you using to collect the
>>>> perf numbers?
>>>>
>>>> I am concerned that what may be a win on a CPU that keeps a couple of
>>>> hundred instructions in-flight and has many MB of caches will not hold for
>>>> a small core.
>>>>
>>>>
>>>> In my experience, unrolling tends to help weaker cores even more than
>>>> stronger ones because it allows the instruction scheduler more
>>>> opportunities to hide latency. Obviously, instruction-cache pressure is an
>>>> important consideration, but the code size changes here seems small.
>>>>
>>>>
>>>> Is the proposed change universal? Is there a way to undo it?
>>>>
>>>>
>>>> All of the unrolling thresholds should be target-adjustable using the
>>>> TTI::getUnrollingPreferences hook.
>>>>
>>>>  -Hal
>>>>
>>>>
>>>>
>>>> On Tue, Feb 7, 2017 at 3:26 PM, Dehao Chen via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> Ping... with the updated code size impact data, any more comments? Any
>>>>> more data that would be interesting to collect?
>>>>>
>>>>> Thanks,
>>>>> Dehao
>>>>>
>>>>> On Thu, Feb 2, 2017 at 2:07 PM, Dehao Chen <dehao at google.com> wrote:
>>>>>
>>>>>> Here is the code size impact for clang, chrome and 24 google internal
>>>>>> benchmarks (name omited, 14 15 16 are encoding/decoding benchmarks similar
>>>>>> as h264). There are 2 columns, for threshold 300 and 450 respectively.
>>>>>>
>>>>>> I also tested the llvm test suite. Changing the threshold to 300/450
>>>>>> does not affect code gen for any binary in the test suite.
>>>>>>
>>>>>>
>>>>>>
>>>>>> 300 450
>>>>>> clang 0.30% 0.63%
>>>>>> chrome 0.00% 0.00%
>>>>>> 1 0.27% 0.67%
>>>>>> 2 0.44% 0.93%
>>>>>> 3 0.44% 0.93%
>>>>>> 4 0.26% 0.53%
>>>>>> 5 0.74% 2.21%
>>>>>> 6 0.74% 2.21%
>>>>>> 7 0.74% 2.21%
>>>>>> 8 0.46% 1.05%
>>>>>> 9 0.35% 0.86%
>>>>>> 10 0.35% 0.86%
>>>>>> 11 0.40% 0.83%
>>>>>> 12 0.32% 0.65%
>>>>>> 13 0.31% 0.64%
>>>>>> 14 4.52% 8.23%
>>>>>> 15 9.90% 19.38%
>>>>>> 16 9.90% 19.38%
>>>>>> 17 0.68% 1.97%
>>>>>> 18 0.21% 0.48%
>>>>>> 19 0.99% 3.44%
>>>>>> 20 0.19% 0.46%
>>>>>> 21 0.57% 1.62%
>>>>>> 22 0.37% 1.05%
>>>>>> 23 0.78% 1.30%
>>>>>> 24 0.51% 1.54%
>>>>>>
>>>>>> On Wed, Feb 1, 2017 at 6:08 PM, Mikhail Zolotukhin via llvm-dev <
>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>
>>>>>>> On Feb 1, 2017, at 4:57 PM, Xinliang David Li via llvm-dev <
>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>
>>>>>>> clang, chrome, and some internal large apps are good candidates for
>>>>>>> size metrics.
>>>>>>>
>>>>>>> I'd also add the standard LLVM testsuite just because it's the suite
>>>>>>> everyone in the community can use.
>>>>>>>
>>>>>>> Michael
>>>>>>>
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>> On Wed, Feb 1, 2017 at 4:47 PM, Chandler Carruth via llvm-dev <
>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>
>>>>>>>> I had suggested having size metrics from somewhat larger
>>>>>>>> applications such as Chrome, Webkit, or Firefox; clang itself; and maybe
>>>>>>>> some of our internal binaries with rough size brackets?
>>>>>>>>
>>>>>>>> On Wed, Feb 1, 2017 at 4:33 PM Dehao Chen <dehao at google.com> wrote:
>>>>>>>>
>>>>>>>>> With the new data points, any comments on whether this can justify
>>>>>>>>> setting fully inline threshold to 300 (or any other number) in O2? I can
>>>>>>>>> collect more data points if it's helpful.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Dehao
>>>>>>>>>
>>>>>>>>> On Tue, Jan 31, 2017 at 3:20 PM, Dehao Chen <dehao at google.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Recollected the data from trunk head with stddev data and more
>>>>>>>>> threshold data points attached:
>>>>>>>>>
>>>>>>>>> Performance:
>>>>>>>>>
>>>>>>>>> stddev/mean 300 450 600 750
>>>>>>>>> 403 0.37% 0.11% 0.11% 0.09% 0.79%
>>>>>>>>> 433 0.14% 0.51% 0.25% -0.63% -0.29%
>>>>>>>>> 445 0.08% 0.48% 0.89% 0.12% 0.83%
>>>>>>>>> 447 0.16% 3.50% 2.69% 3.66% 3.59%
>>>>>>>>> 453 0.11% 1.49% 0.45% -0.07% 0.78%
>>>>>>>>> 464 0.17% 0.75% 1.80% 1.86% 1.54%
>>>>>>>>> Code size:
>>>>>>>>>
>>>>>>>>> 300 450 600 750
>>>>>>>>> 403 0.56% 2.41% 2.74% 3.75%
>>>>>>>>> 433 0.96% 2.84% 4.19% 4.87%
>>>>>>>>> 445 2.16% 3.62% 4.48% 5.88%
>>>>>>>>> 447 2.96% 5.09% 6.74% 8.89%
>>>>>>>>> 453 0.94% 1.67% 2.73% 2.96%
>>>>>>>>> 464 8.02% 13.50% 20.51% 26.59%
>>>>>>>>> Compile time is proportional in the experiments and more noisy, so
>>>>>>>>> I did not include it.
>>>>>>>>>
>>>>>>>>> We have >2% speedup on some google internal benchmarks when
>>>>>>>>> switching the threshold from 150 to 300.
>>>>>>>>>
>>>>>>>>> Dehao
>>>>>>>>>
>>>>>>>>> On Mon, Jan 30, 2017 at 5:06 PM, Chandler Carruth <
>>>>>>>>> chandlerc at google.com> wrote:
>>>>>>>>>
>>>>>>>>> On Mon, Jan 30, 2017 at 4:59 PM Mehdi Amini <mehdi.amini at apple.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Another question is about PGO integration: is it already hooked
>>>>>>>>> there? Should we have a more aggressive threshold in a hot function?
>>>>>>>>> (Assuming we’re willing to spend some binary size there but not on the cold
>>>>>>>>> path).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I would even wire the *unrolling* the other way: just suppress
>>>>>>>>> unrolling in cold paths to save binary size. rolled loops seem like a
>>>>>>>>> generally good thing in cold code unless they are having some larger impact
>>>>>>>>> (IE, the loop itself is more expensive than the unrolled form).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Agree that we could suppress unrolling in cold path to save code
>>>>>>>>> size. But that's orthogonal with the propose here. This proposal focuses on
>>>>>>>>> O2 performance: shall we have different (higher) fully unroll threshold
>>>>>>>>> than dynamic/partial unroll.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I agree that this is (to some extent) orthogonal, and it makes
>>>>>>>>> sense to me to differentiate the threshold for full unroll and the
>>>>>>>>> dynamic/partial case.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> There is one issue that makes these not orthogonal.
>>>>>>>>>
>>>>>>>>> If even *static* profile hints will reduce some of the code size
>>>>>>>>> increase caused by higher unrolling thresholds for non-cold code, we should
>>>>>>>>> factor that into the tradeoff in picking where the threshold goes.
>>>>>>>>>
>>>>>>>>> However, getting PGO into the full unroller is currently
>>>>>>>>> challenging outside of the new pass manager. We already have some
>>>>>>>>> unfortunate hacks around this in LoopUnswitch that are making the port of
>>>>>>>>> it to the new PM more annoying.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> LLVM Developers mailing list
>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> LLVM Developers mailing list
>>>>>>> llvm-dev at lists.llvm.org
>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> LLVM Developers mailing list
>>>>>>> llvm-dev at lists.llvm.org
>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing listllvm-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>> --
>>>> Hal Finkel
>>>> Lead, Compiler Technology and Programming Languages
>>>> Leadership Computing Facility
>>>> Argonne National Laboratory
>>>>
>>>> --
>>> Hal Finkel
>>> Lead, Compiler Technology and Programming Languages
>>> Leadership Computing Facility
>>> Argonne National Laboratory
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170213/b10fe583/attachment-0001.html>