[llvm] [LoopVectorize] Perform loop versioning for some early exit loops (PR #120603)

Fri Feb 21 00:42:52 PST 2025

arcbbb wrote:

> Do I understand correctly that this is generating out-of-bounds loads and relying on page granularity to guarantee they don't trap? If so, we cannot perform this transform with normal loads, because it is UB at the IR level -- the behavior of the underlying hardware is irrelevant.
> 
> You're going to need a new load intrinsic to support this. I have this RFC draft on the topic lying around: https://hackmd.io/@nikic/S1O4QWYZkx I haven't submitted it, because I'm not particularly happy with the `%defined_size` parameter, which is needed to specify the operational semantics of the intrinsic, but not relevant for lowering.

For RVV first faulting loads, a VP variant intrinsic could support this, for example  `declare { <8 x i8>, i32 } @llvm.vp.load.ff.v8i8.p0(ptr %ptr, <8 x i1> %mask, i32 %evl)`.
It differs by returning a structure that includes both the loaded data and latest EVL.

https://github.com/llvm/llvm-project/pull/120603