[llvm] [LV][POC] Use umin to avoid second-to-last iteration problems with EVL (PR #143434)

Mon Jun 9 15:40:32 PDT 2025

mshockwave wrote:

> Doing so means we may end up with an extra "umin" in the loop, but simplies the implementation since the canonical IV does not need to be changed.

Just want to make sure I understand correctly: when you said an extra umin here, you meant an extra umin instruction in the final machine code, right? Because in the LLVM IR level (i.e. vectorizer) you're _replacing_ get.vector.length with umin.

If that's the case, I guess the my (daring) question is: can we get rid of that umin instruction?
What you're proposing is:
```
minu a1, a0, VLMAX
vsetvli a2, a1, ...
```
where a0 is the AVL. This code, as you also mentioned, always yields VLMAX in a2 except the last iteration, forcing VL to be the values we favor regardless of the hardware implementation.

But what if we optimize it to
```
vsetvli a2, a0, ...
```
Sure, this alters the behaviors because now a2 values in the last two iterations might be different (compared with the minu + vsetvli). But will that actually be unsafe? If it's dictating a memory operation the total number of elements, hence the address range, should be the same, so I don't think there will be a page fault.

> This requires materializing the VF*UF term as a loop invariant expression. This will involve a read of vlenb, which may be slow on certain hardware.

I think I'm less concerned about this as we're already reading vlenb in our current approach

https://github.com/llvm/llvm-project/pull/143434