[llvm] [LV][VPlan] Add initial support for CSA vectorization (PR #121222)

Tue Jan 21 05:52:18 PST 2025

ayalz wrote:

The motivating example of
```
int t = init_val;
for (int i = 0; i < N; i++) {
  if (cond[i])
    t = a[i];
}
s = t; // use t
```
suggests to vectorize this as
```
int t = init_val;
<VF x i1> vmask = 0;
<VF x ?> va;
for (int i = 0; i < N; i+=VF) {
  vmaski = cond[i:i+VF-1];
  <VF x ?> va = a[i:i+VF-1];
  if (any(maski)) {
    vmask = vmaski;
    va = vai;
  }
}
if (any(vmask)) {
  i = last(vmask);
  t = extract(va, i);
}
s = t; // use t
```
arguing that it's better to pass vmask as a live-out and sink looking for its last turned-on lane to after the loop, instead of looking for it inside the loop and passing i as live-out.

Continuing with this argument, better to also sink the loading of a[i] to after the loop, instead of loading vectorized va with mask  inside the loop?

In general, there may be some function t=f(i) of i that produces the value t being conditionally overwritten, e.g., think of f(i) as a polynomial of i. Such a function f should arguably be sunk and computed once after the loop - based on figuring out the (at most one) relevant iteration i. The reduction becomes a "FindLast" reduction once this function is sunk. Sound reasonable?

This is reminiscent of sinking the "x, y" of AnyOf to produce a boolean reduction followed by an "anyof ? x : y" function, rather than carrying x and y inside the loop, see @fhahn's commit bccb7ed8ac289.

Finally, note that LV already supports some sort of "CSA", introduced by @annamthomas some years ago in https://reviews.llvm.org/D52656; see test variant_val_store_to_inv_address_conditional in llvm/test/Transforms/LoopVectorize/X86/invariant-store-vectorization.ll - a conditional store to the same address could be converted into a conditional scalar assignment coupled with sinking the conditional store to after the loop.

https://github.com/llvm/llvm-project/pull/121222