<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/81518>81518</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [RISCV] Try to optimise LMUL when VL is suitably constrained (or zero!).
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          sh1boot
      </td>
    </tr>
</table>

<pre>
    Here are a handful of examples where I _believe_ it would be legitimate to change cut the `vsetvl` operands down to use a sub-register LMUL, or even to elide the whole instruction, based on knowledge of the value of VL and VLMAX for shorter LMUL:

```C++
void get_generic(void* dst, size_t size, uint8_t const* ptr, size_t stride) {
    uint8_t* dst8 = reinterpret_cast<uint8_t*>(dst);
    auto reg = __riscv_vlse8_v_u8m1(ptr, stride, size);
    __riscv_vse8(dst8, reg, size);
}

void get_small(uint8_t* dst, size_t size, uint8_t const* ptr, size_t stride) {
 if (size > 4) __builtin_unreachable();
    get_generic(dst, size, ptr, stride);
}

void get_infinitesimal(uint8_t* dst, size_t size, uint8_t const* ptr, size_t stride) {
    if (size > 0) __builtin_unreachable();
    get_generic(dst, size, ptr, stride);
}

void get(void* result, uint8_t const* ptr, size_t stride) {
    get_generic(result, 0, ptr, stride);
}

void get(uint64_t& result, uint8_t const* ptr, size_t stride) {
 get_generic(&result, sizeof(result), ptr, stride);
}

void get(vuint8mf8_t& result, uint8_t const* ptr, size_t stride) {
    size_t size = __riscv_vsetvlmax_e8mf8();
    get_generic(&result, size, ptr, stride);
}
```

I've used `vlse8` here, as an example where it's plausible that an implementation will have to iterate over lanes and may suffer from unnecessarily long LMUL if VL shows up too late to curtail the unnecessary uops.  Simpler operations may or may not benefit depending on the implementation, I guess?

I'm not sure if the zero-length cases are candidates for elision.  Removing them would mean that the destination register won't see the qemu `rvv_ta_all_1s` feature, and it may have implications for register renaming in some implementations.

Perhaps it would be better to replace zero-length ops with a move operation, which faces much less risk of taking multiple cycles or reserving a memory pipeline or whatever.

Then of course, the same question extends to whether or not it's legal and appropriate to cut down LMUL>1 to LMUL<=1.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzEVl1v27gS_TX0yyCGRMe2_OCHNGlwA6TARdsb3DeBkkYStxSpckip7q9fDO0kdnaBLrpdLKCEljgcnnP4cUYR6c4i7sX6nVjfLVQMvfN76vPKubCoXHPY_wc9guI_6JVt2mjAtYDf1DAaJJh77n-AskKjccISdIDZRdNAhWCw00EPKiAEB3WvbIdQxwChRxCbbCIMkxGbDNyIXtmGoHGz5eBIPCXF6spjpymgh8cP_3sU8hacB5wwRaHRDaZsc-8MgrYUfKyDdpYjK0XYgLPwxbrZYNMhg-fwSZmYXp4eQdkGnh4_3PwfWueBeudfZlvdiOxOZM__N9nxuRXyHT_p6-R0Ax2GskOLXtdCFvxJyBtoKDAM0t-xDKnh16htKMoAtbPcfwNj8OdhwesGhdyB2J6mAIDnUae0BYjVHXjUNqAfPYayVhTE6vY1TKzeC1kkCDuxOsukYnDgsUspytJrqqdyMoRFOZWxGHIhi2dIJyy3J_CXiV7GEhbHqQoO9dj92QixvTsX80U2GpQxQhaXBH-NbroFIQsOALF6D9fcW5ZV1CZoW0brUdW9qgwKWbxld7mkZ5C4faPPj1lq22qrA5Ie1D_FFuAt4exfI3x2CjxSNOHnOV3ies2W_SQuRrG5Zu03fw_bJTAhN6_ZeJRrz-DuflbEhGtoi1-AF-B8k10e_3QTD-pbiTzZD3fHH8j-RXrPN-g52wchtxPyjd8kU-CbiD2BrYXzKQJlny3n5Dg6CLklGI2KpCvDHqACh2kOGtAGxSYAszYGejUlA9IBPXuRm9CDURYp3f2DOgDFtkUPrXcDRGuxRiLltTmAcbZLZsBn6-mR_WEmiCME58A8W1v0QWmTrOV1-AGiG2kJ8Cmh8keXY1yUJnU-NdYFqNBiqwM0OKJttO3YtTjbJR-W4wG6iERidf9WxCGlosj6HG3uO3p3ZdB2oYdaERP2CLWyjW5UQEqOh0aTdnYJ8BEHN_HkocfhZOMDKntUlxM2SEHbo7YvxjwzsG0AwqMXf8Uh8kr6aSqDKpUxZU68oi2qEE-LahsuFZh_Wh7mqeuTOIzqJbtHqwYGpS2QG95KQstzGf6LvlcjXVQhFQbOk0xvNKq-lMWNBLMOPSgY3ISva8Qo517XPbSqRoIh1j0YJAKv6UsqJNQXxjVEEzRvzfpQc02UwBP6pKSCAQfnDzDqEY22yN1zrwJO6C-gf-7RctbaRU9JI9aS1IDwNbLsjg9BQK6SguNzEHreUz6t-ulAGOyUSeKqcfRu9Pplh4ZjbXWsa97n_PH4-1as7vLlotmvmt1qpxa4z7fZdrvZbNf5ot_nm806L9q8rTZtXausXW-2O9Xkeb3O811TLPReZvI6k7nMd6tdni-LTG5WWd4gynXW7hpxneGgtFkaMw1L57uFJoq4L_J1XiyMqtBQqkGltDhD6hRScknq9zzmqoodievMaAr0miXoYFLx-vHh0-2TWN_BZ39gXm4MetCEx3M7s7BPj6AJKOqgKnM43ppeact3jiycT3tCyFzI3XIRvdn3IYzEBaC8F_K-06GP1bJ2g5D3DODUXI3e_YZ1EPI-wSYh7xOt3wMAAP__A_Wt7A">