[PATCH] D15408: [AArch64/LoopUnrollRuntime] Don't avoid high-cost trip count computation on the AArch64
Junmo Park via llvm-commits
llvm-commits at lists.llvm.org
Thu Dec 10 22:43:16 PST 2015
flyingforyou added a comment.
Thanks Zhaoshi.
I've just run a bunch of benchmarking including test-suite on Juno(Cortex-A57), there were many improvements and some regressions.
The performance results of test-suite show 1.33% improvement and incur 0.78% regression.
To compute composite benchmark result value, geometric mean is used.
Actually I found some regression after merging r234846.
url: http://reviews.llvm.org/D8994
After this commit merged, @hfinkel upload new commit r237947.
> On X86 (and similar OOO cores) unrolling is very limited, and even if the runtime unrolling is otherwise profitable, the expense of a division to compute the trip count could greatly outweigh the benefits. On the A2, we unroll a lot, and the benefits of unrolling are more significant (seeing a 5x or 6x speedup is not uncommon), so we're more able to tolerate the expense, on average, of adivision to compute the trip count.
I totally agree with this comment. Most of AArch64 Cores support h/w divider including floating point. So I think we can have unrolling oppotunity more.
http://reviews.llvm.org/D15408
More information about the llvm-commits
mailing list