[llvm-dev] (RFC) Adjusting default loop fully unroll threshold
Hal Finkel via llvm-dev
llvm-dev at lists.llvm.org
Tue Feb 7 22:24:35 PST 2017
On 02/07/2017 05:29 PM, Sanjay Patel via llvm-dev wrote:
> Sorry if I missed it, but what machine/CPU are you using to collect
> the perf numbers?
>
> I am concerned that what may be a win on a CPU that keeps a couple of
> hundred instructions in-flight and has many MB of caches will not hold
> for a small core.
In my experience, unrolling tends to help weaker cores even more than
stronger ones because it allows the instruction scheduler more
opportunities to hide latency. Obviously, instruction-cache pressure is
an important consideration, but the code size changes here seems small.
>
> Is the proposed change universal? Is there a way to undo it?
All of the unrolling thresholds should be target-adjustable using the
TTI::getUnrollingPreferences hook.
-Hal
>
> On Tue, Feb 7, 2017 at 3:26 PM, Dehao Chen via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>
> Ping... with the updated code size impact data, any more comments?
> Any more data that would be interesting to collect?
>
> Thanks,
> Dehao
>
> On Thu, Feb 2, 2017 at 2:07 PM, Dehao Chen <dehao at google.com
> <mailto:dehao at google.com>> wrote:
>
> Here is the code size impact for clang, chrome and 24 google
> internal benchmarks (name omited, 14 15 16 are
> encoding/decoding benchmarks similar as h264). There are 2
> columns, for threshold 300 and 450 respectively.
>
> I also tested the llvm test suite. Changing the threshold to
> 300/450 does not affect code gen for any binary in the test suite.
>
>
>
> 300 450
> clang 0.30% 0.63%
> chrome 0.00% 0.00%
> 1 0.27% 0.67%
> 2 0.44% 0.93%
> 3 0.44% 0.93%
> 4 0.26% 0.53%
> 5 0.74% 2.21%
> 6 0.74% 2.21%
> 7 0.74% 2.21%
> 8 0.46% 1.05%
> 9 0.35% 0.86%
> 10 0.35% 0.86%
> 11 0.40% 0.83%
> 12 0.32% 0.65%
> 13 0.31% 0.64%
> 14 4.52% 8.23%
> 15 9.90% 19.38%
> 16 9.90% 19.38%
> 17 0.68% 1.97%
> 18 0.21% 0.48%
> 19 0.99% 3.44%
> 20 0.19% 0.46%
> 21 0.57% 1.62%
> 22 0.37% 1.05%
> 23 0.78% 1.30%
> 24 0.51% 1.54%
>
>
> On Wed, Feb 1, 2017 at 6:08 PM, Mikhail Zolotukhin via
> llvm-dev <llvm-dev at lists.llvm.org
> <mailto:llvm-dev at lists.llvm.org>> wrote:
>
>> On Feb 1, 2017, at 4:57 PM, Xinliang David Li via
>> llvm-dev <llvm-dev at lists.llvm.org
>> <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>> clang, chrome, and some internal large apps are good
>> candidates for size metrics.
> I'd also add the standard LLVM testsuite just because it's
> the suite everyone in the community can use.
>
> Michael
>>
>> David
>>
>> On Wed, Feb 1, 2017 at 4:47 PM, Chandler Carruth via
>> llvm-dev <llvm-dev at lists.llvm.org
>> <mailto:llvm-dev at lists.llvm.org>> wrote:
>>
>> I had suggested having size metrics from somewhat
>> larger applications such as Chrome, Webkit, or
>> Firefox; clang itself; and maybe some of our internal
>> binaries with rough size brackets?
>>
>> On Wed, Feb 1, 2017 at 4:33 PM Dehao Chen
>> <dehao at google.com <mailto:dehao at google.com>> wrote:
>>
>> With the new data points, any comments on whether
>> this can justify setting fully inline threshold
>> to 300 (or any other number) in O2? I can collect
>> more data points if it's helpful.
>>
>> Thanks,
>> Dehao
>>
>> On Tue, Jan 31, 2017 at 3:20 PM, Dehao Chen
>> <dehao at google.com <mailto:dehao at google.com>> wrote:
>>
>> Recollected the data from trunk head with
>> stddev data and more threshold data points
>> attached:
>>
>> Performance:
>>
>> stddev/mean 300 450 600 750
>> 403 0.37% 0.11% 0.11% 0.09% 0.79%
>> 433 0.14% 0.51% 0.25% -0.63% -0.29%
>> 445 0.08% 0.48% 0.89% 0.12% 0.83%
>> 447 0.16% 3.50% 2.69% 3.66% 3.59%
>> 453 0.11% 1.49% 0.45% -0.07% 0.78%
>> 464 0.17% 0.75% 1.80% 1.86% 1.54%
>>
>>
>> Code size:
>>
>> 300 450 600 750
>> 403 0.56% 2.41% 2.74% 3.75%
>> 433 0.96% 2.84% 4.19% 4.87%
>> 445 2.16% 3.62% 4.48% 5.88%
>> 447 2.96% 5.09% 6.74% 8.89%
>> 453 0.94% 1.67% 2.73% 2.96%
>> 464 8.02% 13.50% 20.51% 26.59%
>>
>>
>> Compile time is proportional in the
>> experiments and more noisy, so I did not
>> include it.
>>
>> We have >2% speedup on some google internal
>> benchmarks when switching the threshold from
>> 150 to 300.
>>
>> Dehao
>>
>> On Mon, Jan 30, 2017 at 5:06 PM, Chandler
>> Carruth <chandlerc at google.com
>> <mailto:chandlerc at google.com>> wrote:
>>
>> On Mon, Jan 30, 2017 at 4:59 PM Mehdi
>> Amini <mehdi.amini at apple.com
>> <mailto:mehdi.amini at apple.com>> wrote:
>>
>>>
>>>
>>> Another question is about
>>> PGO integration: is it
>>> already hooked there? Should
>>> we have a more aggressive
>>> threshold in a hot function?
>>> (Assuming we’re willing to
>>> spend some binary size there
>>> but not on the cold path).
>>>
>>>
>>> I would even wire the
>>> *unrolling* the other way: just
>>> suppress unrolling in cold paths
>>> to save binary size. rolled
>>> loops seem like a generally good
>>> thing in cold code unless they
>>> are having some larger impact
>>> (IE, the loop itself is more
>>> expensive than the unrolled form).
>>>
>>>
>>>
>>> Agree that we could suppress
>>> unrolling in cold path to save code
>>> size. But that's orthogonal with the
>>> propose here. This proposal focuses
>>> on O2 performance: shall we have
>>> different (higher) fully unroll
>>> threshold than dynamic/partial unroll.
>>
>> I agree that this is (to some extent)
>> orthogonal, and it makes sense to me
>> to differentiate the threshold for
>> full unroll and the dynamic/partial case.
>>
>>
>> There is one issue that makes these not
>> orthogonal.
>>
>> If even *static* profile hints will
>> reduce some of the code size increase
>> caused by higher unrolling thresholds for
>> non-cold code, we should factor that into
>> the tradeoff in picking where the
>> threshold goes.
>>
>> However, getting PGO into the full
>> unroller is currently challenging outside
>> of the new pass manager. We already have
>> some unfortunate hacks around this in
>> LoopUnswitch that are making the port of
>> it to the new PM more annoying.
>>
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170208/0f7dd34d/attachment-0001.html>
More information about the llvm-dev
mailing list