[llvm] [LV][VPlan] Add initial support for CSA vectorization (PR #121222)
via llvm-commits
llvm-commits at lists.llvm.org
Tue Jan 21 05:52:18 PST 2025
ayalz wrote:
The motivating example of
```
int t = init_val;
for (int i = 0; i < N; i++) {
if (cond[i])
t = a[i];
}
s = t; // use t
```
suggests to vectorize this as
```
int t = init_val;
<VF x i1> vmask = 0;
<VF x ?> va;
for (int i = 0; i < N; i+=VF) {
vmaski = cond[i:i+VF-1];
<VF x ?> va = a[i:i+VF-1];
if (any(maski)) {
vmask = vmaski;
va = vai;
}
}
if (any(vmask)) {
i = last(vmask);
t = extract(va, i);
}
s = t; // use t
```
arguing that it's better to pass vmask as a live-out and sink looking for its last turned-on lane to after the loop, instead of looking for it inside the loop and passing i as live-out.
Continuing with this argument, better to also sink the loading of a[i] to after the loop, instead of loading vectorized va with mask inside the loop?
In general, there may be some function t=f(i) of i that produces the value t being conditionally overwritten, e.g., think of f(i) as a polynomial of i. Such a function f should arguably be sunk and computed once after the loop - based on figuring out the (at most one) relevant iteration i. The reduction becomes a "FindLast" reduction once this function is sunk. Sound reasonable?
This is reminiscent of sinking the "x, y" of AnyOf to produce a boolean reduction followed by an "anyof ? x : y" function, rather than carrying x and y inside the loop, see @fhahn's commit bccb7ed8ac289.
Finally, note that LV already supports some sort of "CSA", introduced by @annamthomas some years ago in https://reviews.llvm.org/D52656; see test variant_val_store_to_inv_address_conditional in llvm/test/Transforms/LoopVectorize/X86/invariant-store-vectorization.ll - a conditional store to the same address could be converted into a conditional scalar assignment coupled with sinking the conditional store to after the loop.
https://github.com/llvm/llvm-project/pull/121222
More information about the llvm-commits
mailing list