[PATCH] D130618: [AArch64][LoopVectorize] Enable tail-folding by default for SVE

Thu Aug 4 07:30:41 PDT 2022

david-arm added a comment.

Hi @dmgreen, we had exactly the same thought process as you. We have already explored unrolling tail-folded loops with reductions and even with the best code quality it makes zero difference to performance on these cores. Sadly it doesn't make the loops faster in the slightest - there appears to be a fundamental bottleneck that cannot be surpassed. No amount of unrolling (using LLVM or by hand) helps in the case of reductions or first-order recurrences. I have also tried unrolling plus manually rescheduling instructions in different ways, but to no avail. We were also very suprised by this, and wish it were different!

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130618/new/

https://reviews.llvm.org/D130618