<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/80392>80392</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [RISCV] Missing opportunities to optimize RVV instructions
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          wangpc-pp
      </td>
    </tr>
</table>

<pre>
    In the SelectionDAG level, we have several code paths to generate RVV pseudos:
1. RVV intrinsics -> RVV pseudos.
2. ISD nodes -> RVV pseudos.
3. RISCVISD nodes -> RVV pseudos.
4. RVV intrinsics -> RISCVISD nodes -> RVV pseudos.
5. ISD nodes -> RISCVISD nodes -> RVV pseudos.
6. etc.

Most of the optimizations for RVV are based on RISCVISD nodes, so we may miss some opportunities to optimize some codes.
For example (https://godbolt.org/z/f1jWEfhG7):
```c
vuint8m1_t dup(uint8_t* data) {
    return __riscv_vmv_v_x_u8m1(*data, __riscv_vsetvlmax_e8m1());
}

vuint8m1_t dup2(uint8_t* data) {
 return __riscv_vlse8_v_u8m1(data, 0, __riscv_vsetvlmax_e8m1());
}
```
```asm
dup:
        vsetvli a1, zero, e8, m1, ta, ma
        vlse8.v v8, (a0), zero
 ret
dup2:
        vsetvli a1, zero, e8, m1, ta, ma
        vlse8.v v8, (a0), zero
        ret
```
These two snippets are of same assemblies because we lower intrinsics of `vmv.v.x` to `RISCVISD::VMV_V_X` first, and then we can optimize it to zero-stride load if profitable.
But, this is not common for other cases:
```c
vuint16m2_t vadd(vuint16m2_t a, vuint8m1_t b) {
    int vl = __riscv_vsetvlmax_e8m1();
 vuint16m2_t c = __riscv_vzext_vf2_u16m2(b, vl);
    return __riscv_vadd_vv_u16m2(a, c, vl);
}

vuint16m2_t vwaddu(vuint16m2_t a, vuint8m1_t b) {
    return __riscv_vwaddu_wv_u16m2(a, b, __riscv_vsetvlmax_e16m2());
}
```
```asm
vadd:
        vsetvli a0, zero, e16, m2, ta, ma
        vzext.vf2       v12, v10
 vadd.vv v8, v8, v12
        ret
vwaddu:
        vsetvli a0, zero, e8, m1, ta, ma
        vwaddu.wv       v8, v8, v10
 ret
```
We can't optimize `vzext.vf2+vadd.vv` to `vwaddu.wv`, because we lower these intrinsics to RVV pseudos directly.
Of cource, there is the same problem for `ISD->RVV pseudos` path:
```c
typedef vuint8m1_t v16xi8 __attribute__((riscv_rvv_vector_bits(__riscv_v_fixed_vlen)));
typedef vuint16m2_t v16xi32 __attribute__((riscv_rvv_vector_bits(__riscv_v_fixed_vlen * 2)));

v16xi32 add(v16xi32 a, v16xi8 b) {
    v16xi32 c = __riscv_vzext_vf2_u16m2(b, 16);
    return a + c;
}
```
```asm
add:
        vsetivli        zero, 16, e16, m2, ta, ma
        vzext.vf2       v12, v10
        vadd.vv v8, v12, v8
        ret
```
I think we need to an universal representation (RISCVISD?) to do optimizations. But when GISel is supported, we may need to do all the optimizations on GIR again? Or should we move all optimizations to later MIR passes?
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy8l19vozgQwD-N8zIqAich9CEP7Waz6kO1Unvq3hsyeEi8ZzCyjZP2059sIE3SptvbOx2KiAzzz-P52QMzRmwaxCWZ35L5asI6u1V6uWPNpi2v2nZSKP68vGvAbhEeUWJphWpWN99AokNJ6BfYIWyZQzDoUDMJpeIILbNbA1bBBhvUzCI8PD1Ba7DjypDpDYlXJL5JovBYNFaLxojSwBWZfj0WjXpBGsHd4woaxfGizDSCh7vHL0-_FJxd8Pop5fnbSD6ll0aAthwG_f1eGQuqCqlVrRW1eGE-uwYqpYMJphEKZpCDas78-Mwb5ZNfs2eohTFgVO0NtUrbrhFWYFiAwTL2r_3ijCGtlQbcs7qVCIRmW2vbsDR0Teh6o3ihpI2U3hC6fiF0XSU_f3yttt8WhF4fVpCkcf8r-7HrRGOzOskt8K4lNAvj3BJ6A5xZRug1kMVtLwwAoNF2uoE818KULne1y12-z7usTgjNCL3ptb68Shi0TtZsn-Mocx0iGoySxeo4yacB0V9FdB6ONJjlbgxnjCX-vYDGXJ0Nman7Jz5jY2JhuHrrAljifb6gVv4fM3-vw7M-pJqd6fnIIwcuSBKasTjENdg4zPbgmf6_rofrEMFZcv7YokGwOwWmEW2L1gQYVAWG1QjMGKwL6Uu8wJJ1Bj0JUu1QH3OtKiBp7GoXuWhP0tjzQNJ4JMnPeHrzdP-UP-V_-teV0Mb6QFnDPZaNt1qy5hUiYb0NP48rY7Xg3injICpotaqEZYXEAa_bLpiyW2FAGGiUhVLVtWoC3spuUUPJDJqPYUrSmuYWHOOc0Oz4UUj-UX0Xb-ASjQUngUxXH1frWKpwbL481XvBvc1dRfPOvyc0K4J7eaL_DtCM89y5g1YIunyr-h6249R3jPPuH0_-PJBgJd-dhVJcQHmQ-W2Ww3pdJCo-ISpJA0b0A6R89iNX0XGcBGGXxOPCMc4jNzI33BN6AbYho58M79fAB3PRzo3jkyDi883mLHc_AmKELuwrZp7accqE3g6zeyX44NFb8at4vgnYsH8cbQVWHR_KwIXG0srngdXvFZSq0yX2xKJGj6w_mMN202pVSKwDuCSN7x5X_pQ_7mjSOLQ8F1G2zy1yrI4r1iXpXmSQ58xaLYrOYp6Hisv6YtTO5Q5Lq3ReCGsIzQ5lmldijzx3EpuhQI9q9MTViJD3NaX_1hn4Q5O-dTlU1eBk2KnGUV8FYa5vIR2lPrXZeE7e3WwYEHoL5W9geolS4TkYrhGEHtP_gtbx-Sm0g1T2uRPyzp8szV--5BtE7gucNdA1wqE2TILGVqPBxoae0h_Dr8fe2q-DVcDVaeMZwW1nYecPvm93jyg9BKYLHSXyodf37ebokCtgUr7TwCpv4AHYhomGTNfwXYPZqk7yYEE5DHqnOlaBZBY13N89QOtPeEOm6wlfTvn19JpNcJks4jSb02yxmGyXM4pZmiZYFhlLswKz6wzjaZXSGNN5lSYTsaQxncU0pvF8tpgvomwW0zSmsyKNFzNazMksxpoJGUnpat_oToQxHS6zeHpNJ5IVKM34XaSXXuiq6DaGzGIpjDWvalZYGb6gQobJfAX3whjRbD7oxvsvEGN1F76ozKTTcnnWggu77YqoVDWha-9s-LtqtfqJpSV0HQI2hK5DzH8HAAD__4HLH9s">