[llvm] [LV][VPlan] Add initial support for CSA vectorization (PR #121222)

Michael Maitland via llvm-commits llvm-commits at lists.llvm.org
Mon Feb 10 13:53:42 PST 2025


michaelmaitland wrote:

@ayalz, we are carrying future patches which should improve the performance of the loop by using mask logic instead of reductions inside the loop. It looks something like this:

```
int t = init_val;
<VF x i1> vmask = 0;
<VF x ?> va;
for (int i = 0; i < N; i+=VF) {
  vmaski =  cond[i:i+VF-1];
  vmask = (vmsbf(vmaski) & vmask) | vmaski
  vai = a[i:i+VF-1]
  va = vmerge vmaski, vai, va
}
if any(vmask) {
  i = last(vmask)
  t = extract (va, i)
}
s = t; // use t
```

This is not the same as a FindLast inside the loop because there is no reducing on each loop iteration. Since this pattern is not an extension of "FindLast", I'm not sure it is a good idea to develop CSAs as reductions.

> I think such patterns are essentially extensions of "FindLast" reduction and should be developed as such, rather than being considered distinct unrelated patterns.

@Mel-Chen can you chime in here? Can FindLast handle non-monotonic cases? I think the reason we took the approach proposed in this patch was because FindLast only works for monotonic cases.


https://github.com/llvm/llvm-project/pull/121222


More information about the llvm-commits mailing list