[PATCH] [AArch64] Enable partial unrolling and runtime unrolling for AArch64 target

Thu Sep 25 03:49:14 PDT 2014

On Thu, Sep 25, 2014 at 3:32 AM, Kevin Qin <kevinqindev at gmail.com> wrote:

> Hi Eric,
>
>
> Attachments show the experiment results I did on A57. I tried 10, 20 ,30
> ,40 and 50 as loop buffer size and applied my runtime prologue
> simplification patch(http://reviews.llvm.org/D5147). Then I collect
> execution time, geomean code bloat and max code bloat at difference loop
> buffer size.
>
>
>
> So considering getting more performance improvement with an acceptable code
> bloat, setting loop buffer size as 20 is a good choice. Maybe it’s a little
> conservative, but there’s not much performance improvement from increasing
> it to 30 or 40, and have to fix a huge code bloat at a larger loop buffer
> size which is not acceptable at -O2 and below optimizaiton level.
>

First question: why powers of 10? There are good reasons to use powers of 2
here -- the addressing and other arithmetic required by unrolling is often
cheaper.

Second question: can you try to drill down? Specifically, what about 16?
18? 22? 24? It would be useful to essentially try to refine the precision
of the curve near to current hypothesized good threshold.

(At least, this was how I established the x86 partial unrolling threshold
and similar other thresholds in LLVM such as the inline thresholds...)

In some cases, this helps you find where the cliffs are, as the line is
rarely smooth.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140925/1f70710f/attachment.html>