[PATCH] D39415: [ARMISelLowering] Better handling of NEON load/store for sequential memory regions

Fri Nov 3 07:05:21 PDT 2017

eastig added a comment.

In https://reviews.llvm.org/D39415#914995, @rengolin wrote:

> Not very convincing numbers. I guess efficient pipelining can make most of the difference wash away.

Agree. Maybe there will be more interesting data from other benchmarks.

> Maybe older ARM cores, like A8, will get better improvement?

We have Cortex-A9 Panda boards. I'll use them.

> What about compile time? And code size?

The LNT server does not provide compile time reports. I am not sure whether they are not displayed or no compile time data available.
Also compilation happens on a target. This is not cross-compilation. I don't know how compile time is stable on Juno boards.
I think Evgeny could try to do cross-compilation runs of the LNT test suite on X86. It's worth to run CTMark. Apple says it's quite good at catching compile time regressions.

Attached the picture of code size improvements (sorry for not providing them as plain table data):
F5466918: mem_improvements.png <https://reviews.llvm.org/F5466918>

https://reviews.llvm.org/D39415