[PATCH] D152963: [RISCV] Don't assume tail undefined if there's no policy op

Luke Lau via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jun 19 06:59:06 PDT 2023


luke added inline comments.


================
Comment at: llvm/test/CodeGen/RISCV/rvv/roundtozero-vp.ll:698
 ; CHECK-NEXT:    fld fa5, %lo(.LCPI29_0)(a1)
+; CHECK-NEXT:    vmset.m v0
 ; CHECK-NEXT:    vsetvli zero, a0, e64, m8, ta, ma
----------------
reames wrote:
> This delta is very odd as v0 doesn't appear to be used.  Why is this not getting DCE?
v0 is being used as the passthrough for the vmflt.vf below. These new v0s come from these setcc_vl nodes which have vmset as both the passthrough and mask operands:

```
t51: nxv8i1 = RISCVISD::SETCC_VL t49, t50, setolt:ch, t62, t62, t78
t62: nxv8i1 = RISCVISD::VMSET_VL Register:i64 $x0
```

It's difficult to see the diff in these test cases, but I'm now convinced that the `vmset.m v0` is necessary to be correct. Here's a reduced test case which shows the issue a bit better:

```
define <vscale x 1 x i1> @vmflt_allonesmask(<vscale x 1 x float> %a, <vscale x 1 x float> %b, i64 %vl, ptr %p) {
entry:
  %head = insertelement <vscale x 1 x i1> poison, i1 true, i32 0
  %mask = shufflevector <vscale x 1 x i1> %head, <vscale x 1 x i1> poison, <vscale x 1 x i32> zeroinitializer
  %passthru = load <vscale x 1 x i1>, ptr %p
  %v = call <vscale x 1 x i1> @llvm.riscv.vmflt.mask.nxv1f32(
    <vscale x 1 x i1> %passthru,
    <vscale x 1 x float> %a,
    <vscale x 1 x float> %b,
    <vscale x 1 x i1> %mask,
    i64 %vl)
  ret <vscale x 1 x i1> %v
}
```

On head at 6947db2778e0f4799f5311bc80fe7963aa8409c6 this generates:

```
vmflt_allonesmask:                      # @vmflt_allonesmask
        .cfi_startproc
# %bb.0:                                # %entry
        vsetvli zero, a0, e32, mf2, ta, ma
        vmflt.vv        v0, v8, v9
        ret
```

Which I believe is wrong because we still need to load the passthrough into `v0` to preserve tail agnostic semantics. 

The intrinsic is selected as:
`t14: nxv1i1 = PseudoVMFLT_VV_MF2_MASK nofpexcept t20, t2, t4, Register:nxv1i1 $v0, t6, TargetConstant:i64<5>, t22:1`

And then the post-isel peephole discards the passthrough t20:
`t25: nxv1i1 = PseudoVMFLT_VV_MF2 nofpexcept t2, t4, t6, TargetConstant:i64<5>`

With the patch we get:

```
vmflt_allonesmask:                      # @vmflt_allonesmask
        .cfi_startproc
# %bb.0:                                # %entry
        vsetvli a2, zero, e8, mf8, ta, ma
        vlm.v   v0, (a1)
        vsetvli zero, a0, e32, mf2, ta, ma
        vmflt.vv        v0, v8, v9
        ret
```

I'll add these smaller cases as a precommit test to show this better.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152963/new/

https://reviews.llvm.org/D152963



More information about the llvm-commits mailing list