[PATCH] D39415: [ARMISelLowering] Better handling of NEON load/store for sequential memory regions

Fri Nov 3 05:18:37 PDT 2017

rengolin added a comment.

In https://reviews.llvm.org/D39415#914992, @eastig wrote:

> I've got first results of benchmark runs: the LNT test suite + a private benchmark 01. I used the latest patch.
>  The configuration is a Juno board Cortex-A57/A53, v8-a, AArch32, Thumb2.
>  Options: -O3 -mcpu=cortex-a57 -mthumb -fomit-frame-pointer
>  The runs passed without errors.

Thanks Evgeny.

> Improvements:
> 
> | MultiSource/Applications/JM/lencod/lencod | 2.57% |
> | Private benchmark 01                      | 9.5%  |
> 
> Regressions:
> 
> | SingleSource/Benchmarks/Misc/salsa20 | 3.41% |

Not very convincing numbers. I guess efficient pipelining can make most of the difference wash away.

Maybe older ARM cores, like A8, will get better improvement?

What about compile time? And code size?

https://reviews.llvm.org/D39415