[PATCH] D48074: [ARM] Enable useAA() for the in-order Cortex-R52
Dave Green via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Jun 18 03:11:23 PDT 2018
dmgreen added a comment.
Yes I can see that. I would have liked to turn this on for more in-order cores, but without scheduling enough to at least say that a load takes multiple cycles, I didn't feel I had a great justification. For the record, these were the changes I saw on a A53 with useAA returning true (units are time, so lower is better. these are more than 2%):
SingleSource/Benchmarks/BenchmarkGame/n-body -14.38%
SingleSource/Benchmarks/Shootout/Shootout-lists -6.40%
SingleSource/Benchmarks/Misc-C++/Large/ray -6.20%
MultiSource/Applications/ALAC/encode/alacconvert-encode -5.44%
MultiSource/Benchmarks/McCat/17-bintr/bintr -3.27%
SingleSource/Benchmarks/CoyoteBench/huffbench -3.15%
MultiSource/Benchmarks/SciMark2-C/scimark2 -2.97%
MultiSource/Benchmarks/Bullet/bullet -2.50%
MultiSource/Benchmarks/TSVC/LoopRerolling-flt/LoopRerolling-flt -2.33%
SingleSource/Benchmarks/Misc/richards_benchmark -2.20%
MultiSource/Benchmarks/Ptrdist/yacr2/yacr2 +4.60%
MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1 +9.66%
They don't look too bad, but there are some decreases. enc-pc1 is genuinely worse, yacr2 might be noise. And without instruction scheduling, they may be getting lucky. Compile time increase was roughly 0.25% on CT-mark (may not be statistically significant, but it was enough alternating runs to make me think it's probably close).
I tried it on the A72 too, on both T32 and A64, with more varied results, both showing several large increases in places (including a memcpy benchmark). This option, as far as I can tell, should give more freedom to the DAG, but that may not be used in the best way all the time.
https://reviews.llvm.org/D48074
More information about the llvm-commits
mailing list