[llvm-dev] (RFC) Adjusting default loop fully unroll threshold
Hal Finkel via llvm-dev
llvm-dev at lists.llvm.org
Fri Feb 10 15:23:22 PST 2017
On 02/10/2017 05:21 PM, Dehao Chen wrote:
> Thanks every for the comments.
>
> Do we have a decision here?
You're good to go as far as I'm concerned.
-Hal
>
> Dehao
>
> On Tue, Feb 7, 2017 at 10:24 PM, Hal Finkel <hfinkel at anl.gov
> <mailto:hfinkel at anl.gov>> wrote:
>
>
> On 02/07/2017 05:29 PM, Sanjay Patel via llvm-dev wrote:
>> Sorry if I missed it, but what machine/CPU are you using to
>> collect the perf numbers?
>>
>> I am concerned that what may be a win on a CPU that keeps a
>> couple of hundred instructions in-flight and has many MB of
>> caches will not hold for a small core.
>
> In my experience, unrolling tends to help weaker cores even more
> than stronger ones because it allows the instruction scheduler
> more opportunities to hide latency. Obviously, instruction-cache
> pressure is an important consideration, but the code size changes
> here seems small.
>
>>
>> Is the proposed change universal? Is there a way to undo it?
>
> All of the unrolling thresholds should be target-adjustable using
> the TTI::getUnrollingPreferences hook.
>
> -Hal
>
>
>>
>> On Tue, Feb 7, 2017 at 3:26 PM, Dehao Chen via llvm-dev
>> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>> Ping... with the updated code size impact data, any more
>> comments? Any more data that would be interesting to collect?
>>
>> Thanks,
>> Dehao
>>
>> On Thu, Feb 2, 2017 at 2:07 PM, Dehao Chen <dehao at google.com
>> <mailto:dehao at google.com>> wrote:
>>
>> Here is the code size impact for clang, chrome and 24
>> google internal benchmarks (name omited, 14 15 16 are
>> encoding/decoding benchmarks similar as h264). There are
>> 2 columns, for threshold 300 and 450 respectively.
>>
>> I also tested the llvm test suite. Changing the threshold
>> to 300/450 does not affect code gen for any binary in the
>> test suite.
>>
>>
>>
>> 300 450
>> clang 0.30% 0.63%
>> chrome 0.00% 0.00%
>> 1 0.27% 0.67%
>> 2 0.44% 0.93%
>> 3 0.44% 0.93%
>> 4 0.26% 0.53%
>> 5 0.74% 2.21%
>> 6 0.74% 2.21%
>> 7 0.74% 2.21%
>> 8 0.46% 1.05%
>> 9 0.35% 0.86%
>> 10 0.35% 0.86%
>> 11 0.40% 0.83%
>> 12 0.32% 0.65%
>> 13 0.31% 0.64%
>> 14 4.52% 8.23%
>> 15 9.90% 19.38%
>> 16 9.90% 19.38%
>> 17 0.68% 1.97%
>> 18 0.21% 0.48%
>> 19 0.99% 3.44%
>> 20 0.19% 0.46%
>> 21 0.57% 1.62%
>> 22 0.37% 1.05%
>> 23 0.78% 1.30%
>> 24 0.51% 1.54%
>>
>>
>> On Wed, Feb 1, 2017 at 6:08 PM, Mikhail Zolotukhin via
>> llvm-dev <llvm-dev at lists.llvm.org
>> <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>>> On Feb 1, 2017, at 4:57 PM, Xinliang David Li via
>>> llvm-dev <llvm-dev at lists.llvm.org
>>> <mailto:llvm-dev at lists.llvm.org>> wrote:
>>>
>>> clang, chrome, and some internal large apps are good
>>> candidates for size metrics.
>> I'd also add the standard LLVM testsuite just because
>> it's the suite everyone in the community can use.
>>
>> Michael
>>>
>>> David
>>>
>>> On Wed, Feb 1, 2017 at 4:47 PM, Chandler Carruth via
>>> llvm-dev <llvm-dev at lists.llvm.org
>>> <mailto:llvm-dev at lists.llvm.org>> wrote:
>>>
>>> I had suggested having size metrics from
>>> somewhat larger applications such as Chrome,
>>> Webkit, or Firefox; clang itself; and maybe some
>>> of our internal binaries with rough size brackets?
>>>
>>> On Wed, Feb 1, 2017 at 4:33 PM Dehao Chen
>>> <dehao at google.com <mailto:dehao at google.com>> wrote:
>>>
>>> With the new data points, any comments on
>>> whether this can justify setting fully
>>> inline threshold to 300 (or any other
>>> number) in O2? I can collect more data
>>> points if it's helpful.
>>>
>>> Thanks,
>>> Dehao
>>>
>>> On Tue, Jan 31, 2017 at 3:20 PM, Dehao Chen
>>> <dehao at google.com <mailto:dehao at google.com>>
>>> wrote:
>>>
>>> Recollected the data from trunk head
>>> with stddev data and more threshold data
>>> points attached:
>>>
>>> Performance:
>>>
>>> stddev/mean 300 450 600 750
>>> 403 0.37% 0.11% 0.11% 0.09% 0.79%
>>> 433 0.14% 0.51% 0.25% -0.63% -0.29%
>>> 445 0.08% 0.48% 0.89% 0.12% 0.83%
>>> 447 0.16% 3.50% 2.69% 3.66% 3.59%
>>> 453 0.11% 1.49% 0.45% -0.07% 0.78%
>>> 464 0.17% 0.75% 1.80% 1.86% 1.54%
>>>
>>>
>>> Code size:
>>>
>>> 300 450 600 750
>>> 403 0.56% 2.41% 2.74% 3.75%
>>> 433 0.96% 2.84% 4.19% 4.87%
>>> 445 2.16% 3.62% 4.48% 5.88%
>>> 447 2.96% 5.09% 6.74% 8.89%
>>> 453 0.94% 1.67% 2.73% 2.96%
>>> 464 8.02% 13.50% 20.51% 26.59%
>>>
>>>
>>> Compile time is proportional in the
>>> experiments and more noisy, so I did not
>>> include it.
>>>
>>> We have >2% speedup on some google
>>> internal benchmarks when switching the
>>> threshold from 150 to 300.
>>>
>>> Dehao
>>>
>>> On Mon, Jan 30, 2017 at 5:06 PM,
>>> Chandler Carruth <chandlerc at google.com
>>> <mailto:chandlerc at google.com>> wrote:
>>>
>>> On Mon, Jan 30, 2017 at 4:59 PM
>>> Mehdi Amini <mehdi.amini at apple.com
>>> <mailto:mehdi.amini at apple.com>> wrote:
>>>
>>>>
>>>>
>>>> Another question is
>>>> about PGO integration:
>>>> is it already hooked
>>>> there? Should we have a
>>>> more aggressive
>>>> threshold in a hot
>>>> function? (Assuming
>>>> we’re willing to spend
>>>> some binary size there
>>>> but not on the cold path).
>>>>
>>>>
>>>> I would even wire the
>>>> *unrolling* the other way:
>>>> just suppress unrolling in
>>>> cold paths to save binary
>>>> size. rolled loops seem
>>>> like a generally good thing
>>>> in cold code unless they
>>>> are having some larger
>>>> impact (IE, the loop itself
>>>> is more expensive than the
>>>> unrolled form).
>>>>
>>>>
>>>>
>>>> Agree that we could suppress
>>>> unrolling in cold path to save
>>>> code size. But that's
>>>> orthogonal with the propose
>>>> here. This proposal focuses on
>>>> O2 performance: shall we have
>>>> different (higher) fully unroll
>>>> threshold than dynamic/partial
>>>> unroll.
>>>
>>> I agree that this is (to some
>>> extent) orthogonal, and it makes
>>> sense to me to differentiate the
>>> threshold for full unroll and
>>> the dynamic/partial case.
>>>
>>>
>>> There is one issue that makes these
>>> not orthogonal.
>>>
>>> If even *static* profile hints will
>>> reduce some of the code size
>>> increase caused by higher unrolling
>>> thresholds for non-cold code, we
>>> should factor that into the tradeoff
>>> in picking where the threshold goes.
>>>
>>> However, getting PGO into the full
>>> unroller is currently challenging
>>> outside of the new pass manager. We
>>> already have some unfortunate hacks
>>> around this in LoopUnswitch that are
>>> making the port of it to the new PM
>>> more annoying.
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> <mailto:llvm-dev at lists.llvm.org>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170210/63eed20d/attachment-0001.html>
More information about the llvm-dev
mailing list