[llvm] [VPlan] Add support for in-loop AnyOf reductions (PR #131830)

Thu Mar 20 04:30:49 PDT 2025

alexey-bataev wrote:

> > As I mentioned in [#120405 (comment)](https://github.com/llvm/llvm-project/issues/120405#issuecomment-2569024111), another possible approach is to widen the type of vp.merge,
> 
> > I believe what we talked about doing internally was a vp.zext to i8 then a i8 vp.merge in the loop with a vp.reduce.or after the loop. That avoids putting a vcpop.m in the loop.
> 
> I just ran some tests, I think widening to i8 is also more profitable on the BPI-F3 vs vcpop.m, e.g:
> 
> ```assembly
> 	vsetvli a5, zero, e8, m1, ta, ma
> 	vmv.v.i	v9, 0
> loop:
> 	vsetvli	a5, a7, e32, m1, ta, ma
> 	vle32.v	v8, (a0)
> 	add	a0, a0, a5
> 	vmseq.vx	v0, v8, zero
> 	vsetvli	zero, zero, e8, mf4, ta, ma
> 	vmerge.vim	v10, v9, 1, v0
> 	vor.vv	v11, v11, v10
> 	sub	a7, a7, a5
> 	bnez	a7, loop
> exit:
> 	vmsne.vi	v10, v11, 0
> 	vcpop.m	a1, v10
> ```
> 
> This sounds like an approach all microarchs can agree on.

+1. 
Generally speaking, all such transformations should be cost-based decisions. There should 3 vplans - the original, the one with vcpop and the one with extensions. And cost-based decision should choose the best plan.

> Is anyone at SiFive already working on this? Otherwise I can take a look at it.

Go ahead

https://github.com/llvm/llvm-project/pull/131830