[llvm] [LV][POC] Use umin to avoid second-to-last iteration problems with EVL (PR #143434)

Mon Jun 9 16:04:35 PDT 2025

preames wrote:

> > Doing so means we may end up with an extra "umin" in the loop, but simplies the implementation since the canonical IV does not need to be changed.
> 
> Just want to make sure I understand correctly: when you said an extra umin here, you meant an extra umin instruction in the final machine code, right? Because in the LLVM IR level (i.e. vectorizer) you're _replacing_ get.vector.length with umin.

Yes.  I'd somewhat meant in both places, but the important one is the final assembly.  The get.vector.length is expected to become a vsetvli, but the umin doesn't replace the vsetvli -  we need both the umin and then the vsetvli.

> If that's the case, I guess the my (daring) question is: can we get rid of that umin instruction (in the codegen pipeline, ofc)? What you're proposing is:
> 
> ```
> minu a1, a0, VLMAX
> vsetvli a2, a1, ...
> ```
> 
> where a0 is the AVL. This code, as you also mentioned, always yields VLMAX in a2 except the last iteration, forcing VL to be the values we favor regardless of the hardware implementation.
> 
> But what if we optimize it to
> 
> ```
> vsetvli a2, a0, ...
> ```
> 
> Sure, this alters the behaviors because now a2 values in the last two iterations might be different (compared with the minu + vsetvli). But will that actually be unsafe? If it's dictating a memory operation the total number of elements, hence the address range, should be the same, so I don't think there will be a page fault.

If we do this at the IR level, I believe what you're describing *is* the existing EVL implementation approach.  If we don't change the IR, and do this as a late rewrite, then doing so in a sound manner is tricky.  

> > This requires materializing the VF*UF term as a loop invariant expression. This will involve a read of vlenb, which may be slow on certain hardware.
> 
> I think I'm less concerned about this as we're already reading vlenb in our current approach

I agree, just noting the potential issue.  

https://github.com/llvm/llvm-project/pull/143434