[llvm] [LoopVectorize] Add cost of generating tail-folding mask to the loop (PR #130565)
David Sherwood via llvm-commits
llvm-commits at lists.llvm.org
Mon Mar 10 02:28:21 PDT 2025
david-arm wrote:
In terms of performance, I ran SPEC2017 on neoverse-v1 and I observed a minor 0.8% regression in x264. This is because we now prefer to choose a larger VF in a hot loop which makes sense in general, but the loop has low trip counts so ends up being less efficient. I can personally live with this, since I believe this is still the right thing to do for the majority of loops.
In the case of x264 what happens now is that the cost for VF=vscale x 8 ends up the same as for VF=vscale x 16, and the loop vectoriser favours larger VFs in a tie. We could add a flag to the vectoriser to prefer lower VFs if necessary.
https://github.com/llvm/llvm-project/pull/130565
More information about the llvm-commits
mailing list