[llvm] [LV][VPlan] Add initial support for CSA vectorization (PR #121222)
Michael Maitland via llvm-commits
llvm-commits at lists.llvm.org
Mon Feb 10 13:53:42 PST 2025
michaelmaitland wrote:
@ayalz, we are carrying future patches which should improve the performance of the loop by using mask logic instead of reductions inside the loop. It looks something like this:
```
int t = init_val;
<VF x i1> vmask = 0;
<VF x ?> va;
for (int i = 0; i < N; i+=VF) {
vmaski = cond[i:i+VF-1];
vmask = (vmsbf(vmaski) & vmask) | vmaski
vai = a[i:i+VF-1]
va = vmerge vmaski, vai, va
}
if any(vmask) {
i = last(vmask)
t = extract (va, i)
}
s = t; // use t
```
This is not the same as a FindLast inside the loop because there is no reducing on each loop iteration. Since this pattern is not an extension of "FindLast", I'm not sure it is a good idea to develop CSAs as reductions.
> I think such patterns are essentially extensions of "FindLast" reduction and should be developed as such, rather than being considered distinct unrelated patterns.
@Mel-Chen can you chime in here? Can FindLast handle non-monotonic cases? I think the reason we took the approach proposed in this patch was because FindLast only works for monotonic cases.
https://github.com/llvm/llvm-project/pull/121222
More information about the llvm-commits
mailing list