[llvm] [RISCV] Lower unmasked zero-stride vp.stride to a splat of one scalar load. (PR #97394)

Wed Jul 3 03:55:37 PDT 2024

wangpc-pp wrote:

> > We may use `-1` to represent VLMAX?
> 
> This doesn't get picked up on RV64 but it does on RV32: I think it's because the EVL argument is 32 bits and we check against a 64 bit sentinel value. Probably something we should fix?

Agree, I met the same problem in my quick prototype. We need to fix it.

> 
> ```llvm
> define <vscale x 2 x i32> @f(<vscale x 2 x i32> %x, <vscale x 2 x i32> %y, <vscale x 2 x i1> %mask) {
>   %z = call <vscale x 2 x i32> @llvm.vp.add.nxv2i32(<vscale x 2 x i32> %x, <vscale x 2 x i32> %y, <vscale x 2 x i1> %mask, i32 -1)
>   ret <vscale x 2 x i32> %z
> }
> ```
> 
> ```
> f: 
> 	li	a0, -1
> 	srli	a0, a0, 32
> 	vsetvli	zero, a0, e32, m1, ta, ma
> 	vadd.vv	v8, v8, v9, v0.t
> 	ret
> ```
> 
> > We need to add a passthru operand to vp.strided.load I think.
> 
> We should be able to emulate the passthru with vp.merge:
> 
> ```llvm
> define <vscale x 2 x i32> @vpmerge_vpload(<vscale x 2 x i32> %passthru, ptr %p, <vscale x 2 x i1> %m, i32 zeroext %vl) {
> ; CHECK-LABEL: vpmerge_vpload:
> ; CHECK:       # %bb.0:
> ; CHECK-NEXT:    vsetvli zero, a1, e32, m1, tu, mu
> ; CHECK-NEXT:    vle32.v v8, (a0), v0.t
> ; CHECK-NEXT:    ret
>   %a = call <vscale x 2 x i32> @llvm.vp.load.nxv2i32.p0(ptr %p, <vscale x 2 x i1> splat (i1 -1), i32 %vl)
>   %b = call <vscale x 2 x i32> @llvm.vp.merge.nxv2i32(<vscale x 2 x i1> %m, <vscale x 2 x i32> %a, <vscale x 2 x i32> %passthru, i32 %vl)
>   ret <vscale x 2 x i32> %b
> }
> ```

Yeah! That's feasible!
But is there any reason why vp.strided.load doesn't have a passthru operand? Would it be much more straightforward if we add a passthru operand to it? I don't know the history, but it seems that these intrinsics (gather/scatter vs vp.strided.load/store) are not consistent.

https://github.com/llvm/llvm-project/pull/97394