[PATCH] D130618: [AArch64][LoopVectorize] Enable tail-folding by default for SVE
David Sherwood via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Aug 4 07:30:41 PDT 2022
david-arm added a comment.
Hi @dmgreen, we had exactly the same thought process as you. We have already explored unrolling tail-folded loops with reductions and even with the best code quality it makes zero difference to performance on these cores. Sadly it doesn't make the loops faster in the slightest - there appears to be a fundamental bottleneck that cannot be surpassed. No amount of unrolling (using LLVM or by hand) helps in the case of reductions or first-order recurrences. I have also tried unrolling plus manually rescheduling instructions in different ways, but to no avail. We were also very suprised by this, and wish it were different!
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D130618/new/
https://reviews.llvm.org/D130618
More information about the llvm-commits
mailing list