[llvm] [VPlan] Add support for in-loop AnyOf reductions (PR #131830)
Luke Lau via llvm-commits
llvm-commits at lists.llvm.org
Thu Mar 20 03:29:13 PDT 2025
lukel97 wrote:
> As I mentioned in https://github.com/llvm/llvm-project/issues/120405#issuecomment-2569024111, another possible approach is to widen the type of vp.merge,
> I believe what we talked about doing internally was a vp.zext to i8 then a i8 vp.merge in the loop with a vp.reduce.or after the loop. That avoids putting a vcpop.m in the loop.
I just ran some tests on the BPI-F3, I think widening to i8 is actually more profitable on the BPI-F3 vs vcpop.m, e.g:
```asm
vsetvli a5, zero, e8, m1, ta, ma
vmv.v.i v9, 0
loop:
vsetvli a5, a7, e32, m1, ta, ma
vle32.v v8, (a0)
add a0, a0, a5
vmseq.vx v0, v8, zero
vsetvli zero, zero, e8, mf4, ta, ma
vmerge.vim v10, v9, 1, v0
vor.vv v11, v11, v10
sub a7, a7, a5
bnez a7, loop
exit:
vmsne.vi v10, v11, 0
vcpop.m a1, v10
```
This sounds like an approach all microarchs can agree on. Is anyone at SiFive already working on this? Otherwise I can take a look at it.
https://github.com/llvm/llvm-project/pull/131830
More information about the llvm-commits
mailing list