[llvm-dev] (RFC) Adjusting default loop fully unroll threshold

Dehao Chen via llvm-dev llvm-dev at lists.llvm.org
Fri Feb 10 15:53:48 PST 2017


Thanks Hal, could you help approve https://reviews.llvm.org/D28368?

I'll hold off until early Tuesday in case other people have more concerns.

Thanks,
Dehao

On Fri, Feb 10, 2017 at 3:23 PM, Hal Finkel <hfinkel at anl.gov> wrote:

>
> On 02/10/2017 05:21 PM, Dehao Chen wrote:
>
> Thanks every for the comments.
>
> Do we have a decision here?
>
>
> You're good to go as far as I'm concerned.
>
>  -Hal
>
>
> Dehao
>
> On Tue, Feb 7, 2017 at 10:24 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>
>>
>> On 02/07/2017 05:29 PM, Sanjay Patel via llvm-dev wrote:
>>
>> Sorry if I missed it, but what machine/CPU are you using to collect the
>> perf numbers?
>>
>> I am concerned that what may be a win on a CPU that keeps a couple of
>> hundred instructions in-flight and has many MB of caches will not hold for
>> a small core.
>>
>>
>> In my experience, unrolling tends to help weaker cores even more than
>> stronger ones because it allows the instruction scheduler more
>> opportunities to hide latency. Obviously, instruction-cache pressure is an
>> important consideration, but the code size changes here seems small.
>>
>>
>> Is the proposed change universal? Is there a way to undo it?
>>
>>
>> All of the unrolling thresholds should be target-adjustable using the
>> TTI::getUnrollingPreferences hook.
>>
>>  -Hal
>>
>>
>>
>> On Tue, Feb 7, 2017 at 3:26 PM, Dehao Chen via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Ping... with the updated code size impact data, any more comments? Any
>>> more data that would be interesting to collect?
>>>
>>> Thanks,
>>> Dehao
>>>
>>> On Thu, Feb 2, 2017 at 2:07 PM, Dehao Chen <dehao at google.com> wrote:
>>>
>>>> Here is the code size impact for clang, chrome and 24 google internal
>>>> benchmarks (name omited, 14 15 16 are encoding/decoding benchmarks similar
>>>> as h264). There are 2 columns, for threshold 300 and 450 respectively.
>>>>
>>>> I also tested the llvm test suite. Changing the threshold to 300/450
>>>> does not affect code gen for any binary in the test suite.
>>>>
>>>>
>>>>
>>>> 300 450
>>>> clang 0.30% 0.63%
>>>> chrome 0.00% 0.00%
>>>> 1 0.27% 0.67%
>>>> 2 0.44% 0.93%
>>>> 3 0.44% 0.93%
>>>> 4 0.26% 0.53%
>>>> 5 0.74% 2.21%
>>>> 6 0.74% 2.21%
>>>> 7 0.74% 2.21%
>>>> 8 0.46% 1.05%
>>>> 9 0.35% 0.86%
>>>> 10 0.35% 0.86%
>>>> 11 0.40% 0.83%
>>>> 12 0.32% 0.65%
>>>> 13 0.31% 0.64%
>>>> 14 4.52% 8.23%
>>>> 15 9.90% 19.38%
>>>> 16 9.90% 19.38%
>>>> 17 0.68% 1.97%
>>>> 18 0.21% 0.48%
>>>> 19 0.99% 3.44%
>>>> 20 0.19% 0.46%
>>>> 21 0.57% 1.62%
>>>> 22 0.37% 1.05%
>>>> 23 0.78% 1.30%
>>>> 24 0.51% 1.54%
>>>>
>>>> On Wed, Feb 1, 2017 at 6:08 PM, Mikhail Zolotukhin via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> On Feb 1, 2017, at 4:57 PM, Xinliang David Li via llvm-dev <
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>> clang, chrome, and some internal large apps are good candidates for
>>>>> size metrics.
>>>>>
>>>>> I'd also add the standard LLVM testsuite just because it's the suite
>>>>> everyone in the community can use.
>>>>>
>>>>> Michael
>>>>>
>>>>>
>>>>> David
>>>>>
>>>>> On Wed, Feb 1, 2017 at 4:47 PM, Chandler Carruth via llvm-dev <
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> I had suggested having size metrics from somewhat larger applications
>>>>>> such as Chrome, Webkit, or Firefox; clang itself; and maybe some of our
>>>>>> internal binaries with rough size brackets?
>>>>>>
>>>>>> On Wed, Feb 1, 2017 at 4:33 PM Dehao Chen <dehao at google.com> wrote:
>>>>>>
>>>>>>> With the new data points, any comments on whether this can justify
>>>>>>> setting fully inline threshold to 300 (or any other number) in O2? I can
>>>>>>> collect more data points if it's helpful.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Dehao
>>>>>>>
>>>>>>> On Tue, Jan 31, 2017 at 3:20 PM, Dehao Chen <dehao at google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Recollected the data from trunk head with stddev data and more
>>>>>>> threshold data points attached:
>>>>>>>
>>>>>>> Performance:
>>>>>>>
>>>>>>> stddev/mean 300 450 600 750
>>>>>>> 403 0.37% 0.11% 0.11% 0.09% 0.79%
>>>>>>> 433 0.14% 0.51% 0.25% -0.63% -0.29%
>>>>>>> 445 0.08% 0.48% 0.89% 0.12% 0.83%
>>>>>>> 447 0.16% 3.50% 2.69% 3.66% 3.59%
>>>>>>> 453 0.11% 1.49% 0.45% -0.07% 0.78%
>>>>>>> 464 0.17% 0.75% 1.80% 1.86% 1.54%
>>>>>>> Code size:
>>>>>>>
>>>>>>> 300 450 600 750
>>>>>>> 403 0.56% 2.41% 2.74% 3.75%
>>>>>>> 433 0.96% 2.84% 4.19% 4.87%
>>>>>>> 445 2.16% 3.62% 4.48% 5.88%
>>>>>>> 447 2.96% 5.09% 6.74% 8.89%
>>>>>>> 453 0.94% 1.67% 2.73% 2.96%
>>>>>>> 464 8.02% 13.50% 20.51% 26.59%
>>>>>>> Compile time is proportional in the experiments and more noisy, so I
>>>>>>> did not include it.
>>>>>>>
>>>>>>> We have >2% speedup on some google internal benchmarks when
>>>>>>> switching the threshold from 150 to 300.
>>>>>>>
>>>>>>> Dehao
>>>>>>>
>>>>>>> On Mon, Jan 30, 2017 at 5:06 PM, Chandler Carruth <
>>>>>>> chandlerc at google.com> wrote:
>>>>>>>
>>>>>>> On Mon, Jan 30, 2017 at 4:59 PM Mehdi Amini <mehdi.amini at apple.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Another question is about PGO integration: is it already hooked
>>>>>>> there? Should we have a more aggressive threshold in a hot function?
>>>>>>> (Assuming we’re willing to spend some binary size there but not on the cold
>>>>>>> path).
>>>>>>>
>>>>>>>
>>>>>>> I would even wire the *unrolling* the other way: just suppress
>>>>>>> unrolling in cold paths to save binary size. rolled loops seem like a
>>>>>>> generally good thing in cold code unless they are having some larger impact
>>>>>>> (IE, the loop itself is more expensive than the unrolled form).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Agree that we could suppress unrolling in cold path to save code
>>>>>>> size. But that's orthogonal with the propose here. This proposal focuses on
>>>>>>> O2 performance: shall we have different (higher) fully unroll threshold
>>>>>>> than dynamic/partial unroll.
>>>>>>>
>>>>>>>
>>>>>>> I agree that this is (to some extent) orthogonal, and it makes sense
>>>>>>> to me to differentiate the threshold for full unroll and the
>>>>>>> dynamic/partial case.
>>>>>>>
>>>>>>>
>>>>>>> There is one issue that makes these not orthogonal.
>>>>>>>
>>>>>>> If even *static* profile hints will reduce some of the code size
>>>>>>> increase caused by higher unrolling thresholds for non-cold code, we should
>>>>>>> factor that into the tradeoff in picking where the threshold goes.
>>>>>>>
>>>>>>> However, getting PGO into the full unroller is currently challenging
>>>>>>> outside of the new pass manager. We already have some unfortunate hacks
>>>>>>> around this in LoopUnswitch that are making the port of it to the new PM
>>>>>>> more annoying.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>>
>>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing listllvm-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> --
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170210/f93ca7ee/attachment-0001.html>


More information about the llvm-dev mailing list