<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/113242>113242</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [RISC-V] Remove round-trip to memory when using `compressstore`
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          Validark
      </td>
    </tr>
</table>

<pre>
    ```zig
fn compressstore(vec: @Vector(64, u8), ptr: *@Vector(64, u8), bitstr: u64) void {
    return struct {
 extern fn @"llvm.masked.compressstore.v64i8"(@Vector(64, u8), *@Vector(64, u8), @Vector(64, u1)) callconv(.Unspecified) void;
 }.@"llvm.masked.compressstore.v64i8"(vec, ptr, @bitCast(bitstr));
}

export fn compress(vec: @Vector(64, u8), bitstr: u64, vec2: @Vector(64, u8)) @Vector(64, u8) {
    var buffer: [64]u8 align(64) = undefined;
    compressstore(vec, &buffer, bitstr);
 return buffer -% vec2;
}
```

```asm
compress:
 addi    sp, sp, -128
        sd      ra, 120(sp)
        sd      s0, 112(sp)
        addi    s0, sp, 128
        andi    sp, sp, -64
 li      a4, 64
        vsetvli zero, a4, e8, m2, ta, ma
 vle8.v  v8, (a1)
        vle8.v  v10, (a3)
        vsetivli zero, 1, e64, m1, ta, ma
        vmv.s.x v12, a2
        vsetvli zero, a4, e8, m2, ta, ma
        vcompress.vm    v14, v8, v12
        vcpop.m a1, v12
        mv      a2, sp
        vsetvli zero, a1, e8, m2, ta, ma
        vse8.v  v14, (a2)
        vsetvli zero, a4, e8, m2, ta, ma
        vle8.v  v8, (a2)
        vsub.vv v8, v8, v10
 vse8.v  v8, (a0)
        addi    sp, s0, -128
        ld      ra, 120(sp)
        ld      s0, 112(sp)
        addi    sp, sp, 128
 ret
```

Is it necessary to have this section of the assembly?

```asm
        vse8.v  v14, (a2)
        vsetvli zero, a4, e8, m2, ta, ma
        vle8.v  v8, (a2)
```

I haven't read that much RISC-V Vector assembly yet, but my hunch is this could be done better.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzEVk1v4zgP_jXKhYgh098HH9oGAeY6L97eZZuJteMvSLKmmV-_sGynaeMJOruHLQrZER-KDx9SkoXW8twR5Sx6ZtFhJ0ZT9yp_FY2shPqxK_rqkrOYz_-_5JnxA-NPpw7Kvh0Uaa1Nr4hhaqlkwROwkL9SaXrFMI1Dhi8wpgyz6WUwyiHw6QGokEbPuHGyZGB7WQFLnufAAACKzKg60EaNpbkx0Zsh1cGpm0gwxKaxrdcK_YMq7wNbz8ahTBkiw_QBlcdEN0y-M2VQiqYp-84yTL3_d3qgUp4kVWs2LFgZs-Tg_QHXSeJFx5lAIc2L0IZhusjm4l_XZ8lheXEjvQ29MnBTuy-V7VNFXsBSiQ99st-ZPtbRCgXFeDrR3BbRcxyy6DCmIBp57hbPDFhwgLGr6CQ7utEOYLMFXd3iZdl39jeqrP0zY2DPMFpSupNt7ftbFa-TQrfzzFXNYIGAqCo5EdTDRGEe9z6m79SdtZqfSkx2HznDdMJm2zDNHczHbdg1Jn-PeRdSdBvE4nABNXJBuYpdp5c_q8nYRsIvUv1kn1GUTmOL02hcIq1Y_GxDqWcBbDpXJRX-HesrxucrKLgHaTLyNrTvIs-N1foboVfH1nraewPrO34C_2VGq99acc-27rc_7wvnOMX6hB76wWtB-Jvm1i6i41KSxxT9L1LUq67hqitu6vqP8r8r7NbaY-FZu4qySMPX1tCfV-C_b-i5V_n2Jmq-tomaP9xEw9YmUmQeHAzfNEgDHZWktVAXMD3UwhKYWmrQVBrZd9CfwNQEQmtqi-bCguPjs-U_r-d2qi6zjmFiQJGowNTCQDuWNXz_9r-X_SvMp_81T7iQcafxaKC9QD12ZQ1Sz9qU_dhUUBBUfUdQkDGkvF2VB1UWZGJHuZ9gFkVJHAS7OudRfDr5nNIkogI5hsmJoswv0jSK04SyncynWZ8jch7yIPBEVcToJwElicCgKljIqRWy8dyl26vzTmo9Uu77AYa4a0RBjXYfRIgd_QRnnW7g6LBT-eS0L8azZiFvpDb6fRkjTeO-pGYRWHSA79T2lkD1Y1ftjZLD1BYttb26wM-aOhi17M7AYv7xMov5blRNXhszuHsFjwyPZ2nqsZi-Dxgep6jLYz-o_i8qDcOj46oZHpdkbI5_BwAA___SW6sJ">