[PATCH] D106653: [LoopVectorize][AArch64] Enable ordered reductions by default for SVE

Thu Jul 29 08:19:50 PDT 2021

dmgreen added a comment.

Turning this on sounds good, so long as we do our due-diligence and check it's OK performance wise.  I just did some experiments and it seemed fine-ish, but they were only very small exampled run on a A510 and A710. So long as there was more than a single real vector operation, the benefits started overcoming the overheads.

A couple of high level comments though:

- I'm not a fan of -march=arm8-a+sve working differently to -march=armv8-a for non-sve related code. i.e "hasSVE" shouldn't be the trigger for allowing NEON inorder reductions, if they do not have anything to do with SVE. We should consider just jumping to step 2 and enabling it for all aarch64, so long as the performance results look OK and we are careful about regressions of course.
- It seems that the cost is so high under SVE that they will never be generated in practice. This suggests that enabling them will have little effect for either testing or performance.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106653/new/

https://reviews.llvm.org/D106653