<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/144289>144289</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            RISC-V: RVV mask register allocation spill
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          my4ng
      </td>
    </tr>
</table>

<pre>
    This example is derived from the RVV B.11 square root approximation example:

```llvm
define <vscale x 2 x i1> @test(<vscale x 2 x double> %0, <vscale x 2 x double> %1) {
entry:
  %2 = tail call i64 @llvm.riscv.vsetvli.i64(i64 -1, i64 3, i64 1)
  %3 = tail call <vscale x 2 x i1> @llvm.riscv.vmfne.nxv2f64.f64.i64(<vscale x 2 x double> %0, double 0.000000e+00, i64 -1)
  %4 = tail call <vscale x 2 x i1> @llvm.riscv.vmfne.mask.nxv2f64.f64.i64(<vscale x 2 x i1> %3, <vscale x 2 x double> %1, double 0.000000e+00, <vscale x 2 x i1> %3, i64 -1)
  ret <vscale x 2 x i1> %4
}
```
The resulting assembly using llc trunk: [godbolt](https://godbolt.org/z/6n6s3d9r9)

```
test: # @test
        fmv.d.x fa5, zero
        vsetvli a0, zero, e64, m2, ta, mu
        vmfne.vf        v12, v8, fa5
        vmv1r.v v0, v12
        vmfne.vf        v12, v10, fa5, v0.t
        vmv1r.v v0, v12
 ret
```

Here it moves the mask value into v0 twice, as when the vector multiplier is 4 or 8.

However, it behaves as expected when the vector multiplier is 1: [godbolt](https://godbolt.org/z/bKP76f1s1)

```
test: # @test
        fmv.d.x fa5, zero
        vsetvli a0, zero, e64, m1, ta, mu
        vmfne.vf        v0, v8, fa5
        vmfne.vf v0, v9, fa5, v0.t
        ret
```


This is a similar issue as #113489, where the mask register is unnecessarily moved, even though it does not need to be, as the specification states, 
> The destination vector register group for a masked vector instruction cannot overlap the source mask register (v0), unless the destination vector register is being written with a mask value (e.g., compares) or the scalar result of a reduction.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJy8Vk1z4zYM_TX0BRMNRX3YPujgJOtpp5fOdid3SoIsdilSJSnZ2V_fAWWnjtt8zB6qURTJBB8egEeQ0nt1MIgVK-5Z8biSU-itq4bn3BxWtW2fq2-98oAnOYwaQXlo0akZW-icHSD0CF-fnuA-SVPwf03SIThrA8hxdPakBhmUNZfpLNsxHu-SL7fW88D4rsVOGQSWPcy-kRrhBAJOoFKWfQGW84A-MLG5HW_tVBPqF2Ci4Ew8_AvhlUXKxBbY-p7xHZrgnhc6QEMCWPYIQSoNjdQaVJmTX6KXOOWbOZk9hlmrRJU5Exsav0vJI71llxfycIHMbiDfDO7aydAZTMxpFl2ZJ_S3uPs48OUX4AmPFzJxz_mF1t01r_yneQ3Sf_8EuTOAKLLPVOQd6u8Dvw7MYXjbPifJrR-vhcf47luP4NBPOihzAOk9DrV-hsnTp9YNBDeZ7yzbASvuD7atrQ6seGRi04cwelKP2DOxPw8l1h2Y2P9gYl-a0mft1m0Xdjd-o5gJVWQv2qYQlqsb5qRNTtDJgqL8gc5ejZ5VCJK_DIoHQKrDAwyCnkHG9-l6Vizf3L18p9Fy3tCTHF2bzqlLZpijAzL8GCblFxz64kn4CM9huMkJ47tf0CGoAIOd0cfGQnqDWeoJQZlgYeYQjqpBQpIejj2aaDdjE6yDgSo5aoWO2lQO1sEmOWPbI87oomwC1NhL8iGpr43YBGw_AEt_RgX1b7-vyy716f-sgvSzKuBvi-BseTbZvl3f_yxlXFzKU-YkeDUoLSmNfkLKORNZmmb5JqIeeyr7S7UdHpQPS9InY7BB76VT-jnKoo1hzrFSdjr0VMzWogdjAxjEFoKF-qIPAvUjNqpTzbIN-SAD-tha-I56A7WAFn1QZjE4F_-FxcHZaYTOOpCRH7YXE2V8cFMTZzXSEAE7o9NyXPzayTW3MTGxoYTGuCej0S8c3yOgPNRI_ejoVAho4KhCfyZzXhpMbDA5JATa2GGUjiLckvojkUZS8pc-B7YDCQ7bhXiyaqus3WZbucIqXRec5ynf8FVfFWUpcsS6W9e8rIui4EXBMV2vi65ZSyFWqhJcFLxMSy6yVPCENyiztkvzrkVsuo7lHAepdBJ3EesOq1j_Ks1zsdmutKxR-3joEMLgcVEHE4LOIK6iSXf1dPC0DSkf_D8wQQWN1ddf_3i4e6L1Q6eP12mWWttLwUel9WpyurpZrCr0U500dmBiH88gy7-70dk_sQlM7CMhz8T-zHiuxN8BAAD__3-MwMw">