[llvm-dev] (RFC) Adjusting default loop fully unroll threshold
Chandler Carruth via llvm-dev
llvm-dev at lists.llvm.org
Mon Jan 30 17:06:39 PST 2017
On Mon, Jan 30, 2017 at 4:59 PM Mehdi Amini <mehdi.amini at apple.com> wrote:
> Another question is about PGO integration: is it already hooked there?
> Should we have a more aggressive threshold in a hot function? (Assuming
> we’re willing to spend some binary size there but not on the cold path).
> I would even wire the *unrolling* the other way: just suppress unrolling
> in cold paths to save binary size. rolled loops seem like a generally good
> thing in cold code unless they are having some larger impact (IE, the loop
> itself is more expensive than the unrolled form).
> Agree that we could suppress unrolling in cold path to save code size. But
> that's orthogonal with the propose here. This proposal focuses on O2
> performance: shall we have different (higher) fully unroll threshold than
> dynamic/partial unroll.
> I agree that this is (to some extent) orthogonal, and it makes sense to me
> to differentiate the threshold for full unroll and the dynamic/partial case.
There is one issue that makes these not orthogonal.
If even *static* profile hints will reduce some of the code size increase
caused by higher unrolling thresholds for non-cold code, we should factor
that into the tradeoff in picking where the threshold goes.
However, getting PGO into the full unroller is currently challenging
outside of the new pass manager. We already have some unfortunate hacks
around this in LoopUnswitch that are making the port of it to the new PM
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev