[PATCH] D130618: [AArch64][LoopVectorize] Enable tail-folding by default for SVE

Sun Aug 7 23:34:41 PDT 2022

dmgreen added a comment.

In D130618#3699606 <https://reviews.llvm.org/D130618#3699606>, @david-arm wrote:

> Hi @dmgreen, we had exactly the same thought process as you. We have already explored unrolling tail-folded loops with reductions and even with the best code quality it makes zero difference to performance on these cores. Sadly it doesn't make the loops faster in the slightest - there appears to be a fundamental bottleneck that cannot be surpassed. No amount of unrolling (using LLVM or by hand) helps in the case of reductions or first-order recurrences. I have also tried unrolling plus manually rescheduling instructions in different ways, but to no avail. We were also very suprised by this, and wish it were different!

It sounds like you are hitting another bottleneck. Perhaps the amount of predication resources.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130618/new/

https://reviews.llvm.org/D130618