<table border="1" cellspacing="0" cellpadding="8">

    <tr>

        <th>Issue</th>

        <td>

            <a href=https://github.com/llvm/llvm-project/issues/144289>144289</a>

        </td>

    </tr>

    <tr>

        <th>Summary</th>

        <td>

            RISC-V: RVV mask register allocation spill

        </td>

    </tr>

    <tr>

      <th>Labels</th>

      <td>

            new issue

      </td>

    </tr>

    <tr>

      <th>Assignees</th>

      <td>

      </td>

    </tr>

    <tr>

      <th>Reporter</th>

      <td>

          my4ng

      </td>

    </tr>

</table>

<pre>

    This example is derived from the RVV B.11 square root approximation example:

```llvm

define <vscale x 2 x i1> @test(<vscale x 2 x double> %0, <vscale x 2 x double> %1) {

entry:

  %2 = tail call i64 @llvm.riscv.vsetvli.i64(i64 -1, i64 3, i64 1)

  %3 = tail call <vscale x 2 x i1> @llvm.riscv.vmfne.nxv2f64.f64.i64(<vscale x 2 x double> %0, double 0.000000e+00, i64 -1)

  %4 = tail call <vscale x 2 x i1> @llvm.riscv.vmfne.mask.nxv2f64.f64.i64(<vscale x 2 x i1> %3, <vscale x 2 x double> %1, double 0.000000e+00, <vscale x 2 x i1> %3, i64 -1)

  ret <vscale x 2 x i1> %4

}

```

The resulting assembly using llc trunk: [godbolt](https://godbolt.org/z/6n6s3d9r9)

```

test: # @test

        fmv.d.x fa5, zero

        vsetvli a0, zero, e64, m2, ta, mu

        vmfne.vf        v12, v8, fa5

        vmv1r.v v0, v12

        vmfne.vf        v12, v10, fa5, v0.t

        vmv1r.v v0, v12

 ret

```

Here it moves the mask value into v0 twice, as when the vector multiplier is 4 or 8.

However, it behaves as expected when the vector multiplier is 1: [godbolt](https://godbolt.org/z/bKP76f1s1)

```

test: # @test

        fmv.d.x fa5, zero

        vsetvli a0, zero, e64, m1, ta, mu

        vmfne.vf        v0, v8, fa5

        vmfne.vf v0, v9, fa5, v0.t

        ret

```

This is a similar issue as #113489, where the mask register is unnecessarily moved, even though it does not need to be, as the specification states, 

> The destination vector register group for a masked vector instruction cannot overlap the source mask register (v0), unless the destination vector register is being written with a mask value (e.g., compares) or the scalar result of a reduction.

</pre>

<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJy8Vk1z4zYM_TX0BRMNRX3YPujgJOtpp5fOdid3SoIsdilSJSnZ2V_fAWWnjtt8zB6qURTJBB8egEeQ0nt1MIgVK-5Z8biSU-itq4bn3BxWtW2fq2-98oAnOYwaQXlo0akZW-icHSD0CF-fnuA-SVPwf03SIThrA8hxdPakBhmUNZfpLNsxHu-SL7fW88D4rsVOGQSWPcy-kRrhBAJOoFKWfQGW84A-MLG5HW_tVBPqF2Ci4Ew8_AvhlUXKxBbY-p7xHZrgnhc6QEMCWPYIQSoNjdQaVJmTX6KXOOWbOZk9hlmrRJU5Exsav0vJI71llxfycIHMbiDfDO7aydAZTMxpFl2ZJ_S3uPs48OUX4AmPFzJxz_mF1t01r_yneQ3Sf_8EuTOAKLLPVOQd6u8Dvw7MYXjbPifJrR-vhcf47luP4NBPOihzAOk9DrV-hsnTp9YNBDeZ7yzbASvuD7atrQ6seGRi04cwelKP2DOxPw8l1h2Y2P9gYl-a0mft1m0Xdjd-o5gJVWQv2qYQlqsb5qRNTtDJgqL8gc5ejZ5VCJK_DIoHQKrDAwyCnkHG9-l6Vizf3L18p9Fy3tCTHF2bzqlLZpijAzL8GCblFxz64kn4CM9huMkJ47tf0CGoAIOd0cfGQnqDWeoJQZlgYeYQjqpBQpIejj2aaDdjE6yDgSo5aoWO2lQO1sEmOWPbI87oomwC1NhL8iGpr43YBGw_AEt_RgX1b7-vyy716f-sgvSzKuBvi-BseTbZvl3f_yxlXFzKU-YkeDUoLSmNfkLKORNZmmb5JqIeeyr7S7UdHpQPS9InY7BB76VT-jnKoo1hzrFSdjr0VMzWogdjAxjEFoKF-qIPAvUjNqpTzbIN-SAD-tha-I56A7WAFn1QZjE4F_-FxcHZaYTOOpCRH7YXE2V8cFMTZzXSEAE7o9NyXPzayTW3MTGxoYTGuCej0S8c3yOgPNRI_ejoVAho4KhCfyZzXhpMbDA5JATa2GGUjiLckvojkUZS8pc-B7YDCQ7bhXiyaqus3WZbucIqXRec5ynf8FVfFWUpcsS6W9e8rIui4EXBMV2vi65ZSyFWqhJcFLxMSy6yVPCENyiztkvzrkVsuo7lHAepdBJ3EesOq1j_Ks1zsdmutKxR-3joEMLgcVEHE4LOIK6iSXf1dPC0DSkf_D8wQQWN1ddf_3i4e6L1Q6eP12mWWttLwUel9WpyurpZrCr0U500dmBiH88gy7-70dk_sQlM7CMhz8T-zHiuxN8BAAD__3-MwMw">