[PATCH] D39415: [ARMISelLowering] Better handling of NEON load/store for sequential memory regions

Fri Nov 3 08:26:50 PDT 2017

rengolin added a comment.

In https://reviews.llvm.org/D39415#915151, @eastig wrote:

> Actually it had but not many.
>  There are some other regressions/improvements. I have not listed them because the hottest code is not changed.

Right, and the fact that TSVC appears on both improvements and regressions is a hint that there are other factors at play.

> MultiSource/Benchmarks/TSVC/LoopRerolling-flt/LoopRerolling-flt has 1.86% execution time regression and 41.88% code size improvement.
>  MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt has 1.48% execution time improvement and 39.61% code size improvement.
>  MultiSource/Benchmarks/TSVC/Equivalencing-flt/Equivalencing-flt has 1.26% execution time improvements and 41.61% code size improvement.

That's pretty consistent, again, probably the same code. But at least you didn't have regressions in non-affected code, which means the early exits are working as expected.

> What about running on Cortex-A53? Its pipeline is in-order.

I don't think out vs. in will make a big difference (maybe just different noise). Because this is an ALU vs. Load/Vector, the pipelining will have a much bigger impact than the dispatcher.

https://reviews.llvm.org/D39415