<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/56316>56316</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Missed oppurtunity to fold vmerge into using vadd
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            backend:RISC-V,
            performance
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          preames
      </td>
    </tr>
</table>

<pre>
    This is a performance opportunity noted when analyzing sqlite3. Filing this with reproduction instructions for the original observation and some analysis below, plan to follow up with reduced test cases when I have some time.

`$clang -isystem $GNU_TOOLCHAIN_DIR/sysroot/usr/include/ --target=riscv64 -mllvm -riscv-v-vector-bits-min=256 -mllvm -scalable-vectorization=on -Xclang -target-feature -Xclang +v,+f,+m,+c,+d,+zba sqlite-autoconf-3380500/sqlite3.c -c -O2 -g`

The amalgamation file is taken from the sqlite3 website.

Non optimal assembly observed:

```
    vmv.v.i    v16, 9
.LBB785_98: <-- this is a loop header, v16 does not change in loop
    vsetvli    zero, zero, e8, m1, ta, mu
    vluxei64.v    v17, (s4), v8
    vand.vx    v18, v17, a4
    vmseq.vi    v0, v18, 0
    vmerge.vim    v18, v16, 0, v0
    vadd.vv    v17, v18, v17
```
We're using the vmerge to materialize a vector of either 9 or zero depending on a mask.  This can be folded into the consuming add by simply doing a masked add of 9.

This assembly should look something like:

```
    vmv.v.i    v16, 9
.LBB785_98:
    vsetvli    zero, zero, e8, m1, ta, mu
    vluxei64.v    v17, (s4), v8
    vand.vx    v18, v17, a4
    vmseq.vi    v0, v18, 0
    vadd.vv    v17, v17, 9, v0
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzVVUtzpDYQ_jXMRQXFiIGBAwc_yomrNnbVxnncXEI0oFggVhJ4Z359WtKMg1N7zCUzKqmR-qXur1uNak_1yyAMwcHIDLpTemQTB6LmWWm7TMKeyKQstOR9gImwicnTWUw9Md-ksJAl5EFI922dmndhB6Jh1qpduBVqImIyVgfaENSOfKhci16gJqIaA3plnpNNLTFqhGDDoLYGpHqP6B2ZJZuIVSgvcYcs89UQWkHPLBhLODNggpOPZGArBGVWjJBE6X2U3lzmIo3ogaPGnsTCnIyFkeDOT0-_vb48P3-5-_nm8en1_vFrRB_wVCtlkVqMxllMXC4tIEXi2DLdg42yey0MX4sDiUcp15HE_jvGP3CrdNwIa-JRTMhJ8-KDy3AmWSPhwibOPgrIhKGI_7z4F2zEHTC7aPjYj-jtinHBpQvLGBYeljYs54ZdshSzxSqupi7OsjLN09Td7ZI_TmIcz5TEvQvNJlIvmCk2Mtnj5DPUCQkOKZa9YZQ7rUafzYsm8g6NQeJTtJ9QTM2YBEw2MwbGRp4uWYc2ym7-nZkw_CfB3zquyZoIT-4LB4UqHCZfbm-PZf5alaiERNldHAcEeiBLpWYyAGtBOxkUJa1CdCCQCR8wgniNyXNtTBmwq_SmzqCVk7uuULp53LvZMk8vG0G5fAdRHJI1uHl0DBEtzSGilTdfbpgR5cn6PXCWwTkvwA7bWxv4lqzh2mlg8ryfIgMIDWQaP-kqApujt8ysRatb_zbGfxT7PxDjRwTcYkJpw8WeK0IEA2jBpDgjPEhAL1EdASxJ0KTC6vaRIy3MMLVOgatulDNvCSG-3XCs5wZcPbdYv2JCtc4IQtQso5NAh0lzIkaMMwKmVX7Pq0B-d4gGq-QzWl3qrxAzg1pk61L85vsAYgM1SPEG_xXo_r_I-QEYjuGWG9h8xGXX1llbZRXbWWEl1L8IjHHr3ofl-j6EztxeMeLTGZDjbO0WLevB2tm4qNEHHD1CZWkSrrBvPbh2eFlifDj-QkC5VmvMAgaJvMj2xW6oj1BwWjRplnGaU5ZCd8z3xT7tDtCxIk932E1BmjrKbyNKG8axR7kO8_Xx17v4d9wKwaWbR85t5vc7UdOU0rTI0n11KLMqKWkO1YHyY17sy5zz6JDCyIRMnI-J0v1O197dZukNHkphrPnnEEEo-gnAu4L6sfkOStezBjaC2fmb1f5afwPuiklG">