[PATCH] [AArch64] Enable partial unrolling and runtime unrolling for AArch64 target

Renato Golin renato.golin at linaro.org
Tue Sep 9 13:33:18 PDT 2014


On 9 September 2014 20:19, Bob Wilson <bob.wilson at apple.com> wrote:
> 50% code size regression, even on just a few tests, is completely
> unacceptable for -O2.

For a not that big performance improvement. Agreed.


> For -O3, it might be tolerable if it provides a very significant performance
> win to compensate. It doesn’t sound like the performance is really that
> compelling, though. I think this should be under the control of a separate
> option, not enabled by default for -O3, until the code size regressions get
> fixed.

So, I thought I had suggested that earlier, but apparently not.

On IRC we discussed the issue and the outcomes were:

* Many benchmarks jump 3~5%, with only one going -3%. Kevin and James
believe the issue is because of inner loop checks, the same issue that
causes the code bloat.
* James assured me they are working on the code size issue, which
should also fix the -3% performance problem.

However, they weren't sure both issues were linked, or that they could
actually reduce most of the bloat. I thought I was alone in the size
worry, it seems not. So, I think there are two main routes now:

1. Fix the code bloat (to at least 10% or less on all), and possibly
the -3% regression, and make it default on -O3
2. Add a flag (easy) / pragma (hard) to control partial/dynamic
unrolling in loops

Unless there is a serious time issue, the problems should be fixed
before the patch is committed. If not, 2 can be done before 1.

cheers,
--renato




More information about the llvm-commits mailing list