[PATCH] D45821: [AArch64] improve code generation of vectors smaller than 64 bit

Fri May 11 12:01:27 PDT 2018

sebpop added a comment.

In https://reviews.llvm.org/D45821#1093386, @sebpop wrote:

> I am reruning the benchmarks with the patch applied on top of https://reviews.llvm.org/D46655 which fixes one of the problems exposed by this patch.

I ran the SPEC 2000 with and without this patch on top of Evandro's patch on A72 firefly and on exynos-m3 and there were no slowdowns and no speedups that were larger than the noise level about 1% over 6 runs.

There is still a performance problem in a proprietary benchmark of the order of 5% on both A72 and exynos-m3:
there are more vectorized loops with this patch and it seems like the code generated for one of the vectorized loops is slower than the scalar version.
We identified a few byte loads and stores that could be merged and Evandro is working on fixing these patterns.
We are still investigating other issues that could bring the performance of the vectorized codes higher.

https://reviews.llvm.org/D45821