[llvm] [LoopVectorize][AArch64][SVE] Generate wide active lane masks (PR #81140)

Mon Mar 11 12:39:58 PDT 2024

paulwalker-arm wrote:

> > always emit a VF*UF sized active lane mask when it is used for control flow. Then leave the code generator to split it when necessary.
> 
> Because the code generator can't split it without introducing inefficient control flow for wrap-around checks.

Sure, hence my wrap flag comment.  LoopVectorize must either already know there is no wrapping in order to emit an active lane mask per unroll factor, or perform saturating maths.  The former case can be passed with the intrinsic to help the code generator and the latter case would match what the code generator should do today.

> 
> > Perhaps we'll need to add wrap flag support to get.active.mask to make that job easier but this should not be all that different to what LoopVectorize would have to do anyway.
> 
> What would be the advantage compared to the current approach?

It would remove the need for yet another TTI function and hence divergence within LoopVectorize. So perhaps not great reasons but what's the advantage of the current approach if it doesn't hamper code generation?

https://github.com/llvm/llvm-project/pull/81140