<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/78656>78656</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            RISC-V x-register to vector-register via temp memory should be optimised to `vmv.vx` or `vmv.sx`
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          sh1boot
      </td>
    </tr>
</table>

<pre>
    I have a generic tail-padding function:
```C++
inline vuint8m1_t tail_load(void const* data, size_t length) {
    uint8_t const* ptr = reinterpret_cast<uint8_t const*>(data);
    vuint8m1_t v = __riscv_vle8_v_u8m1(ptr, length);
 const vuint8m1_t zero = __riscv_vmv_v_x_u8m1(0, __riscv_vsetvlmax_e8m1());
    return __riscv_vslideup(v, zero, length, __riscv_vsetvlmax_e8m1());
}
```
Which can be called on scalar data:
```C++
vuint8m1_t test(uint64_t data) {
    return tail_load(&data, sizeof(data));
}
```
which gives me:
```asm
test(unsigned long):                               # @test(unsigned long)
        addi    sp, sp, -16
        sd      a0, 8(sp)
        addi    a0, sp, 8
        vsetivli        zero, 8, e8, m1, ta, ma
        vle8.v  v8, (a0)
        vsetvli a0, zero, e8, m1, ta, ma
        vmv.v.i v9, 0
        vslideup.vi     v8, v9, 8
 addi    sp, sp, 16
        ret
```

The generic byte load from memory is not necessary if the data is already present in a scalar register and we know its length, so ideally the emitted `vle8` would have been replaced with `vmv.vx`, and extra-ideally the `vmv.vx`/`vslideup` pair fused into `vslide1up`.

Ideal behaviour can be coerced with the following:
```C++
template <size_t length>
inline vuint8m1_t tail_load(void const* data);

template<>
inline vuint8m1_t tail_load<sizeof(uint64_t)>(void const* data) {
    uint64_t const* ptr64 = reinterpret_cast<uint64_t const*>(data);
    const vuint64m1_t zero = __riscv_vmv_v_x_u64m1(0, __riscv_vsetvlmax_e64m1());
    vuint64m1_t v64 = __riscv_vslide1up(zero, *ptr64, __riscv_vsetvlmax_e64m1());
    return __riscv_vreinterpret_u8m1(v64);
}

vuint8m1_t test2(uint64_t data) {
    return tail_load<sizeof(data)>(&data);
}
```
https://godbolt.org/z/Edcq1v4ej

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJycVs2O4jgXfRqzuQIlDoRkwaKaKqTeft9oZhk58YW4x7EztmOq-ulHdggV6PrrQSgo9vW5vz4HZq04KcQd2Xwjm8cFG1yrzc62aa21W9Sav-y-Q8s8AoMTKjSiAceEXPaMc6FOcBxU44RWJHsgySNJHkiejN89od_CN64KJYVC8INQrujSykWUSmrGCS28Fhwarawj9AE4c4zQPVjxEysHEtXJtYSWQLYXNACAiFS512O9M0CyRzAolEPTG3RVw6wj2f7elmRPhBajn5JkM9RZgD6iVZURtvGVl1hUvhqKLiW06J0JEV5De8WILuYwP9HoW6TOV756nqCSAHTds-i87NhzheNuwL4L0aAbjJodkYLj0IcyBqjgbx7b18HJ9vGuhePrX61oWmiYghqhYVIiB63ANkwyM3br4-bPu46hAUVYyNeVu_T6rrWXDOcTQmg-Hwt9nPXvCymcYwon4dFCh7-Gy2w3rkzxqXgvOEitTtFBDOyDD6EZkHXy3vlrbuETrk74tX3MJz6XaX5rZPnFOA5IQWgRDN9BGo1GpOLWJHRdeCmm92k-ivDA-AzDsIexuh27Oy6xWHkAHy0JLYKr8lcXwcMYxeTgC9idX_mVAF-G3eQedJzrlR9DHwMYTacU3yrkfR0NujdHYnz-0eKV1-oXhxDmDY5Gd9Bhp80LCAtKO1DYoLUsLBzBtRgnN2wyaZDxF-gNWlQOhAI23Q2DJ2EdGmCKwxnhb6XPIJydXU6rQXBkUr5EVOyEc8iB5EmoPMkTOOtB8pGDa0QFBnvJGuRwFq6NhqGMzyExuo-e8NkZtpzD3lodwutEG3kCPRMGjoNFDkI5DdftNO6v5gX7HlChxpZ5oQdzpQWN5hpT8HjUUuqzUKdPuMFh10vmEEi2vyX87Om_SseMD258kGz_JdQxksgxE1FF0Kf3_P2qTJHc5tKUrz8Spxvz99VpJi75-hN1CQYfyMtl-w19mcP7S9i3YpNGtZnuOaEPMb_fdnQvZPPaXMTRB9i36P0tXaG_LyyzTl_r_TSXm8-lpXWut2HI6YHQw0nzWku30uZE6OEnoYcn3vyT-jX-GM0XfJfxMivZAnfpNtnkNKdlsmh3yYbWa5YUx3WKNON1tmF1UZZlUmRJeizLhdjRhK6TNC2TNFlvklVDKTbbYptsS9bw7ZasE-yYkCspfRciWAhrB9xti3yTLySrUdr4R49ShWeIm4TS8L_P7MKZZT2cLFknUlhnX1GccBJ3__v-__3yT3heXknNafDYOG1el7xgEG7bRJ62jeRVI-jeiU4EirkQzERHoM30bsP7YjByd1dU4dqhXjW6I_QQwrr8LHujf2DjCD3EZCyhh5jsvwEAAP__2iU-Cw">