[PATCH] D15408: [AArch64/LoopUnrollRuntime] Don't avoid high-cost trip count computation on the AArch64
Kristof Beyls via llvm-commits
llvm-commits at lists.llvm.org
Fri Dec 11 09:49:02 PST 2015
Hi Junmo,
I tried out your patch on top of r254864, on a juno board, running on
Cortex-A57.
I see the following results:
Performance Regressions - Execution Time Δ
lnt.MultiSource/Benchmarks/Ptrdist/yacr2/yacr2
<http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.170=3>
9.17%
lnt.SingleSource/Benchmarks/Shootout-C++/ackermann
<http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.264=3>
8.02%
lnt.MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1
<http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.149=3>
4.78%
spec.cpu2006.ref.445_gobmk
<http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.176=3>
1.84%
spec.cpu2006.ref.483_xalancbmk
<http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.94=3>
1.75%
spec.cpu2006.ref.471_omnetpp
<http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.294=3>
1.43%
spec.cpu2000.ref.253_perlbmk
<http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.337=3>
1.22%
lnt.SingleSource/Benchmarks/Polybench/linear-algebra/kernels/symm/symm
<http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.135=3>
1.10%
Performance Improvements - Execution Time Δ
lnt.MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan
<http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.15=3>
-23.07%
lnt.SingleSource/Benchmarks/Shootout/sieve
<http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.40=3>
-9.50%
lnt.SingleSource/Benchmarks/BenchmarkGame/nsieve-bits
<http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.9=3>
-7.26%
lnt.SingleSource/Benchmarks/BenchmarkGame/recursive
<http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.316=3>
-3.42%
spec.cpu2006.ref.433_milc
<http://llvm-test.cambridge.arm.com:8000/db_default/v4/nts/3523/graph?test.235=3>
-1.12%
While there are a few big jumps in the test-suite, I think the
regressions show this is not
uniformely an improvement for performance.
Thanks,
Kristof
On 11/12/2015 07:43, Junmo Park via llvm-commits wrote:
> flyingforyou added a comment.
>
> Thanks Zhaoshi.
>
> I've just run a bunch of benchmarking including test-suite on Juno(Cortex-A57), there were many improvements and some regressions.
> The performance results of test-suite show 1.33% improvement and incur 0.78% regression.
> To compute composite benchmark result value, geometric mean is used.
>
> Actually I found some regression after merging r234846.
> url: http://reviews.llvm.org/D8994
>
> After this commit merged, @hfinkel upload new commit r237947.
>
>> On X86 (and similar OOO cores) unrolling is very limited, and even if the runtime unrolling is otherwise profitable, the expense of a division to compute the trip count could greatly outweigh the benefits. On the A2, we unroll a lot, and the benefits of unrolling are more significant (seeing a 5x or 6x speedup is not uncommon), so we're more able to tolerate the expense, on average, of adivision to compute the trip count.
>
> I totally agree with this comment. Most of AArch64 Cores support h/w divider including floating point. So I think we can have unrolling oppotunity more.
>
>
> http://reviews.llvm.org/D15408
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20151211/e0faa25f/attachment.html>
More information about the llvm-commits
mailing list