[llvm] [LoopVectorize] Perform loop versioning for some early exit loops (PR #120603)

Fri Feb 21 00:54:16 PST 2025

david-arm wrote:

 > For RVV first faulting loads, a VP variant intrinsic could support this, for example `declare { <8 x i8>, i32 } @llvm.vp.load.ff.v8i8.p0(ptr %ptr, <8 x i1> %mask, i32 %evl)`. It differs by returning a structure that includes both the loaded data and latest EVL.

This patch is explicitly avoiding using first faulting loads, by relying upon a combination of vector load alignment and knowledge of the hardware minimum page size. I believe @huntergr-arm recently created a RFC about an alternative approach using first faulting loads to vectorise more complex loops involving multiple exits and/or stores in the loop.

If I understand correctly, the point raised by @nikic is that the semantics of the IR load operation mean that if any portion of the vector load is out-of-bounds (even though we know it won't fault) then the entire load is UB. This essentially means the entire vectorised loop becomes UB and so in theory could be optimised away, even if in practice for existing tests and benchmark it works. Using a new out-of-bounds load intrinsic with different semantics avoids this problem, and still allows us to achieve the best performance.

https://github.com/llvm/llvm-project/pull/120603