[PATCH] [AArch64] Enable partial unrolling and runtime unrolling for AArch64 target

Tue Sep 9 06:44:14 PDT 2014

Hi Renato,

I think it's a good idea to fix some performance regressions, but it have
nothing to do this the code size, because it just moves the runtime check
outside loop. Also, we need more registers(2 registers are required here,
one is for extra loop count, the other one is for whether jump to the loop
body) to cache the runtime check results, and this would increase the
register pressure for both outer loop and inner loop. If more register
spills/reloads are introduced, the performance may be even worse. Anyway,
it's a good direction to approach, and we need implement this carefully to
avoid increasing register pressure.

Cheers,
Kevin

2014-09-09 13:31 GMT+01:00 Renato Golin <renato.golin at linaro.org>:

> On 9 September 2014 11:19, Kevin Qin <kevinqindev at gmail.com> wrote:
> > I can give more details on the performance regressions here. Basically,
> partial unrolling contributes small performance improvement, regressions
> and code size changes. Most of the regressions are caused by the runtime
> unrolling prologue. This prologue will do some extra work on checking the
> loop iterations and execute the reminder times of loop bodies. If the
> runtime unrolled loop is inside another loop, and inner loop count for each
> running is quite small, then there's a overhead happened in the prologue
> and caused the regession.
>
> One heuristics that is pretty obvious here is not to add dynamic
> checks for inner loops. A better one is to check if the inner loop
> induction variable depends only on values external to the inner loop.
> Any heuristics that determines that you only need one run-time check,
> outside the external loop.
>
>
> > So I suggest we can enable it first, and then try to get it even better
> in future.
>
> That's the point, SPEC is *not* "better".
>
> Some are better, others are worse. The geomean doesn't mean that much
> on artificial benchmarks. It just means that; for the very restrictive
> set of scenarios you're considering, you hurt less than made better,
> but that doesn't translate to anything in the wild.
>
> It may be that the conditions that got better here were made
> unrealistically large and the ones that got worse weren't, so the
> negative impact is proportionally greater, and your geomean would lie
> about real world code. You just don't know. Basically, I cannot see
> how 50% regressions on any benchmark can be a good thing, regardless
> of the wins elsewhere.
>
> I believe Chandler has a good way of testing performance on real world
> code, so I'm copying him to have a look at this change. Others might
> try on their work loads (do Chromium folks test performance?).
>
>
> cheers,
> --renato
>

-- 
Best Regards,

Kevin Qin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140909/a30cf54b/attachment.html>