[PATCH] D48074: [ARM] Enable useAA() for the in-order Cortex-R52

Mon Jun 18 03:11:23 PDT 2018

dmgreen added a comment.

Yes I can see that. I would have liked to turn this on for more in-order cores, but without scheduling enough to at least say that a load takes multiple cycles, I didn't feel I had a great justification. For the record, these were the changes I saw on a A53 with useAA returning true (units are time, so lower is better. these are more than 2%):

SingleSource/Benchmarks/BenchmarkGame/n-body  	-14.38%
SingleSource/Benchmarks/Shootout/Shootout-lists  	-6.40%
SingleSource/Benchmarks/Misc-C++/Large/ray  	-6.20%
MultiSource/Applications/ALAC/encode/alacconvert-encode  	-5.44%
MultiSource/Benchmarks/McCat/17-bintr/bintr  	-3.27%
SingleSource/Benchmarks/CoyoteBench/huffbench  	-3.15%
MultiSource/Benchmarks/SciMark2-C/scimark2  	-2.97%
MultiSource/Benchmarks/Bullet/bullet  	-2.50%
MultiSource/Benchmarks/TSVC/LoopRerolling-flt/LoopRerolling-flt  	-2.33%
SingleSource/Benchmarks/Misc/richards_benchmark  	-2.20%
MultiSource/Benchmarks/Ptrdist/yacr2/yacr2  	+4.60%
MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1  	+9.66%

They don't look too bad, but there are some decreases. enc-pc1 is genuinely worse, yacr2 might be noise. And without instruction scheduling, they may be getting lucky. Compile time increase was roughly 0.25% on CT-mark (may not be statistically significant, but it was enough alternating runs to make me think it's probably close).

I tried it on the A72 too, on both T32 and A64, with more varied results, both showing several large increases in places (including a memcpy benchmark). This option, as far as I can tell, should give more freedom to the DAG, but that may not be used in the best way all the time.

https://reviews.llvm.org/D48074