<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/56315>56315</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Missed oppurtunity to fold scalar offset into base of indexed load
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            backend:RISC-V,
            performance
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          preames
      </td>
    </tr>
</table>

<pre>
    This is a performance opportunity noted when analyzing sqlite3.  Filing this with reproduction instructions for the original observation and some analysis below, plan to follow up with reduced test cases when I have some time.

`$clang -isystem $GNU_TOOLCHAIN_DIR/sysroot/usr/include/ --target=riscv64 -mllvm -riscv-v-vector-bits-min=256 -mllvm -scalable-vectorization=on -Xclang -target-feature -Xclang +v,+f,+m,+c,+d,+zba sqlite-autoconf-3380500/sqlite3.c -c -O2 -g`

The amalgamation file is taken from the sqlite3 website.  

Non optimal assembly observed:
```
        vsetvli zero, zero, e64, m2, ta, mu
        vadd.vx v12, v8, a1
        vadd.vx v14, v10, a1
        vadd.vx v12, v12, a3
        vadd.vx v14, v14, a3
        vsetvli zero, zero, e16, mf2, ta, mu
        vluxei64.v      v16, (zero), v12
        vluxei64.v      v12, (zero), v14
```
Note that in the assembly above, vluxei64.v allows the use of a register operand.  We can fold the prior indexing into the using instruction despite the changed vtype as the indexing is always done at 64 bits.  Note that we do fold indexing in some cases observed in this file, so we probably have an overly strict legality check somewhere.  

This assembly should look something like:
```
        add a3, a3, a1
        vsetvli zero, zero, e16, mf2, ta, mu
        vluxei64.v      v16, (a3), v12
        vluxei64.v      v12, (a3), v14
```
Note as well there's possibly another issue in the same assembly.  We're performing four vector scalar adds whereas we could have performed a single scalar add and two vector adds.  

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJytVdtu4zYQ_Rr7hZChi-3YD37YTZA2QJsA2-3lbUFRI5kNJaokJcf5-h6S8gVBEvShsEDS5MzhcM5cSl0dd9_30jJ8nPVkam1a3gliuu-1cUMn3ZF12lHFDnvqGO-4Or7KrmH2HyUdFQvG7qXyG87jHKTbM0O90dUgnNQdk511Jq4tAzzkgG5kIwHFdGnJjDxI8q5iVrcUL7FAK0npwyy_Zb3iHXMa-go7bOhPF-EWmObIOia4JRutfGB7PlIEc7KlxSy9m6VfpnGdzvKlAGLDEmmP1lHLsPPT4-8_vj89_XL785eHxx93D99m-T1OjdYOq8EajLITaqgIK5YkjpuG3Ky4M9KKcb1kSavU2LIk_E_wI-G0SUrpbNLKDpL5an2WsoIrXiqaxORr8AKE4Irkr8m-eEdSE3eDofP-LP86wi-Y6ji1cRJxquL0WvKJpoQPTgvd1UlRbNJVmvq3TQQKluB7ylnSeNdceeo7mOItVw2GwFAtFflQcfwZXq6NbgObExI7UGmxQEhcozxCUfegAXRza6kt1XHinapZccVK_CbV7WjJjUpi9UpG-yg4zbRe-qnN_eh4WA8XPV5Vi_HFr7IgMW78yLN3JQLSmKWfiESQOPHiM5TlW5GPn5Ctg9n1B29QwwvJ9XIxBvggO8s3UXt7sucD6fw96eW7bn5EZoNC7pCmgcozQbzUIwXVMzjjPvlskBsskrhGzTDUSGSQAcVkkMEg_09CLnY-V6sg2xuJtJddRS--TsgOiRwx4t9zfWAV2V4GiwCxR6Ajt0d37L1dYfMCgnqlDvxoWaU7HDuG_POJhvsvjzoQjqMhV9fHuhDLxSkO4_uB6kPcP9tqr406VnLvjlBP8Cg4xeAvTJbCMUUNV75Cij2J54CL-mPeZkCosGfP2r0eYJDSOmrgWpil5DN9kgyINR9YMbzehOr_HGT-gv8aYleynwQYyDuQUp5BA-feWNZra2UIM_QW7IJPO9ApCC1vL5EYAgpKqH5Tf_LuqvVgWCycLBRSw-CiUP4NhfuYCG4OxE2KoBkFEeqoYhel0HfcQZ_gPM6ZwHm1K6ptseVzJ52i3a8wFDDojsOpO7opwiZEXdcgJEZ5yWOahOAjzzmv5oNRu71zvfV05_f4GvSyoVwIjSp-75vDNCUIv79hlG883j8Wi9W6yFbz_W5L2bLeZje8EvmNqCuRbkS1Lsoiy3m9Kao5egspu5utvs7yvOQCFdtX228Pv90mf2ArMphf9Xy_ubqby12e5nm6LtJsu8yKbFHVmyynVV4UlahFxWfLlFou1cLbuNCmmZtdMLccGotDhYJgL4cgUjYdUTAF-GhFe212PXhqyc7Dy3bhWf8Cor-8xQ">