[llvm] [RISCV] RISC-V split register allocation and move vsetvl pass in between (PR #70549)

Mon Nov 13 01:00:49 PST 2023

lukel97 wrote:

One small regression I noticed with the post-ra insertvsetvli pass is this function from `test/CodeGen/RISCV/rvv/fixed-vectors-mask-buildvec.ll`:

```llvm
define <4 x i1> @buildvec_mask_nonconst_v4i1(i1 %x, i1 %y) {
  %1 = insertelement <4 x i1> poison, i1 %x, i32 0
  %2 = insertelement <4 x i1> %1,  i1 %x, i32 1
  %3 = insertelement <4 x i1> %2,  i1 %y, i32 2
  %4 = insertelement <4 x i1> %3,  i1 %y, i32 3
  ret <4 x i1> %4
}
```

With pre-RA vsetvli insertion we have:

```asm
buildvec_mask_nonconst_v4i1:            # @buildvec_mask_nonconst_v4i1
	.cfi_startproc
# %bb.0:
	vsetivli	zero, 4, e8, mf4, ta, ma
	vmv.v.i	v0, 3
	vmv.v.x	v8, a1
	vmerge.vxm	v8, v8, a0, v0
	vand.vi	v8, v8, 1
	vmsne.vi	v0, v8, 0
	ret
```

But post-RA insertion results in an extra vsetvli:

```asm
buildvec_mask_nonconst_v4i1:            # @buildvec_mask_nonconst_v4i1
	.cfi_startproc
# %bb.0:
	vsetivli	zero, 1, e8, mf8, ta, ma
	vmv.v.i	v0, 3
	vsetivli	zero, 4, e8, mf4, ta, ma
	vmv.v.x	v8, a1
	vmerge.vxm	v8, v8, a0, v0
	vand.vi	v8, v8, 1
	vmsne.vi	v0, v8, 0
	ret
```

>From what I can tell this is due to how the vmv.v.i gets scheduled before the vmv.v.x, e.g. the BB goes from beginning like this:

```
bb.0 (%ir-block.0):
  liveins: $x10, $x11
  %1:gpr = COPY $x11
  %0:gpr = COPY $x10
  %2:vr = PseudoVMV_V_X_MF4 $noreg(tied-def 0), %1:gpr, 4, 3, 0
  %3:vr = PseudoVMV_V_I_MF8 $noreg(tied-def 0), 3, 1, 3, 0
```

to this:

```
0B	bb.0 (%ir-block.0):
	  liveins: $x10, $x11
16B	  %1:gpr = COPY $x11
32B	  %0:gpr = COPY $x10
64B	  renamable $v0 = PseudoVMV_V_I_MF8 undef renamable $v0(tied-def 0), 3, 1, 3, 0
80B	  renamable $v8 = PseudoVMV_V_X_MF4 undef renamable $v8(tied-def 0), %1:gpr, 4, 3, 0
```

We end up with the extra vsetvli because we have an optimisation where we avoid inserting a vsetvli if we know we have a vmv.v.i with VL=1 in needVSETVLI, which is in turn called after the first instruction in transferBefore when going from the vmv.v.x -> vmv.v.i. But because the vmv.v.i instruction is now first, we don't do the check going between the vmv.v.i -> vmv.v.x.

We could probably recover this case by teaching the backwards local postpass about this. 

https://github.com/llvm/llvm-project/pull/70549