[PATCH] D39415: [ARMISelLowering] Better handling of NEON load/store for sequential memory regions

Fri Nov 3 07:34:26 PDT 2017

rengolin added a comment.

In https://reviews.llvm.org/D39415#915070, @eastig wrote:

> The LNT server does not provide compile time reports. I am not sure whether they are not displayed or no compile time data available.
>  Also compilation happens on a target. This is not cross-compilation. I don't know how compile time is stable on Juno boards.

LNT does provide compile time data, but since you're cross-compiling, it's probably being lost somewhere in the process.

> Attached the picture of code size improvements (sorry for not providing them as plain table data):

Interesting. I'm guessing all TSVC reductions are due to the same function.

It's also interesting that TSVC was the code that changed the most, and had no visible performance benefits.

Maybe hinting that the variation in other benchmarks could have been a side effect, not the use of post-increment, including the matrix multiply. Neither lencod nor salsa show up in code size changes.

It's quite likely that your A9 won't show much better results (or even different random results), given that it's OOO and pipelined between memory, ALU and vector operations.

https://reviews.llvm.org/D39415