[PATCH][LoopVectorizer] Restrict the unroll factor of reductions in loops
Arnold Schwaighofer
aschwaighofer at apple.com
Fri Aug 8 10:11:13 PDT 2014
This makes sense to me.
We should evaluate the impact on cyclone (by changing James’ patch to also return 4 for cyclone).
Gerolf could you run benchmarks?
> On Aug 8, 2014, at 7:37 AM, James Molloy <James.Molloy at arm.com> wrote:
>
> Hi Arnold,
>
> Attached are two patches. The first ups the maximum unroll factor on AArch64 from 2 to 4, for C-A57 only at the moment as that’s all I’ve got data for. This gives us significant wins – ~14% on 462.libquantum at least.
Is this from quantum_toffoli? (We saw similar wins there for x86_64).
>
> However it also causes some regressions. The second patch makes the loop vectorizer a bit more conservative with its unroll factor. The problem is purely for reductions within loops. The regressions I’ve seen are small (but runtime-known) trip count loops within a loop nest. A loop unroll factor of 2 is fine, but above 2 the reduction variable fixup logic after the loop increases the critical path length and resource usage. For most loops this isn’t a problem, but small loops in a larger loop nest will execute this fixup code many times.
>
> The heuristic is: if this is a (scalar) reduction, and the loop is nested, clamp the UF to a maximum of 2. With 2, we still get wins but we only add one fadd/fmul to the critical path.
>
> Please take a look.
>
> Cheers,
>
> James
> <up-max-unroll.diff><limit-scalar-reductions.diff>
More information about the llvm-commits
mailing list