<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/63460>63460</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [wasm] should vector locals be promoted (?) to live on the stack if indexed
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          kg
      </td>
    </tr>
</table>

<pre>
    Arbitrary indexing of vector types appears to be suboptimal in trunk clang even with -O3. For this toy example (pulled from our codebase and stripped down):

```c
#include <stdint.h>
#include <stdbool.h>
#include <wasm_simd128.h>

typedef void * gpointer;
typedef int8_t v128_i1 __attribute__ ((vector_size (16)));

void
interp_packedsimd_shuffle (gpointer res, gpointer _lower, gpointer _upper, gpointer _indices) {
        v128_i1 indices = *((v128_i1 *)_indices),
                lower = *((v128_i1 *)_lower),
                upper = *((v128_i1 *)_upper),
                result = { 0 };

        for (int i = 0; i < 16; i++) {
                int index = indices[i] & 31;
                if (index > 15)
                        result[i] = upper[index - 16];
                else
                        result[i] = lower[index];
        }

        *((v128_i1 *)res) = result;
}
```

All of the vector indexing operations appear to generate a temporary memory store of the whole v128 local, then a memory fetch/store of the specific lane being accessed, and potentially a v128 memory load to update the local. According to godbolt (https://gcc.godbolt.org/z/PM9oao635) the `int index = indices[i] & 31` bit looks like this for example:
```
        local.get       4
 local.get       7
        v128.store      48
        block 
        block           
        local.get       4
 i32.const       48
        i32.add 
        local.get       2
 i32.const       15
        i32.and 
        local.tee       1
 i32.or  
        i32.load8_u     0
        i32.const       31
 i32.and 
```

It might be valuable to detect that a vector local has operations like this performed on it multiple times (or even once) and if so, promote (demote?) it to living on the stack (in memory), so that indexed reads/writes are still efficient. It might de-opt the operations that can be performed natively on vectors from locals, but not significantly - afaik the difference between `local.get` and `v128.load` on x64 is that it uses a different type of memory load (2 offsets vs 3 offsets), they're both still memory loads.

Sorry if I filed this in the wrong place! It seems to me like this would be handled at the llvm level.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyMVtGO6jgS_RrzUmqUOBDCAw_dl0Gah9WutB-AnLhCvO3YkV2By3z9qpxAA3N7ZlAUEtt16lTlVNkqRnNyiDux_hDr_UKN1Pmw-zwtaq-vu_dQGwoqXME4jT-NO4Fv4YwN-QB0HTCCGgZUIQJ5qBHiWPuBTK8sGAcURvcJjVXuBHhGBxdDHbz9u1jCgQE6w3ZXwJ-qHyyCkNUwWosa2uB78GOAxmusVURQTkOkYIYBNWh_cUJuRfEusr3Ibvcym65mfpeFcY0dNYIofkTSxtGyE8Vv30zX3ttv5y8q9sdoep3L6nFRunMqNLZw9kaDkO9wGrxxhEEUH88rjKPqSHDOZXU0ORyPiiiYeiQ8Hjl-Iaspvcdo_kgZyUuOdLpuaNOdvU1PyddwHFTziZpJHmM3tu2U0hsXCBiF_HHnBkfrLxieh8ZheB0yTpuGTbcgNncC21sI8zSIYs-hzzHMk2lg-wAh5I87gsi2icFfms4cXw0Tz780nCN5NQwYR0uT5eYDMhCb_UteRbZtfeDUGUdg0tpMFB_p8QfkZXoW8iNdz1kR2TYZcbkkw1vo6w8j1sy2hCJ_cJgs2snZZPMb5Gvm_bDgzvsOU-xhCnD9Mdm9Ma_1_gUZbcS_BZpSPAO9Ymz2L6n5Jt9hVkixh9nDHeUOcavPR8R3a7mnUIe3vvLVagYMiox3ty7DTeaEjkcRFBD2g0_dqcfehytE8gFvaJfOW0yFBtY3yrKoqUMH6ra8RWo6IQ9PZnHAxrSmAascQo1MRDUNxoiaIbgPDZ7QkVHWXkFNLmZI65VmluOgmSMDJudLeG8aHzSjcRBe155VKKuOaIjcyORByMOpaZbz5NKHk5CHP4Q8_OdfW698WbAsEqYos3-isjKD2hBY7z8jWPOJU8tlcc8t96uDPn8bmH8T-RPS_L6ap1_HN89mnJLllNbJrHqer61vPuFXY1-_f0bEFHLZeBfv4y-eeF5p_Tdo8hu0fP0LNPdrNEK8WT2g-fAaCY-yTKrjmN6zP08_Uige0b58_7KWfifozakj3onPyo6qtshy00jYEFCniPU6lVliDZ2Kj3X2JZIBQ-tDjxq8A0PQj5YMb9Jkeu72smIV8abuXYMsTCZnWoiey2QIvveU9h-N_CSKAy8yxISsOacCd1PJkWI1cAec62hq2xD9xDnpHDUEVDoKebgEQ3z0CGxrrAVsW9MYdLSEew40vvmBkoOHABNeoxyn6CtEp8ic0V6Z0ZSeOB1BUpLSrlmPBM4T8HmJ-4NyZK_wBqpV5jN50aZtMaBruG3QBdFxnd6lxtWYvl-ZpfJgDfCYd_CzXIGZuRmCMXJwdzxKJy3uT49NRshKgm_biBThHKG4vcy5ow6vQm4CQu2pm_P0ABCXj8r5rw98yGvhd2gNH8CSBsz0fS7BuxMMVvF3zjnDEbFPR74eHyRz8aPVnNhOOc0gakq_teceLJ7RLhd6V-htsVUL3OVltVlttnJdLbrdJm-xwCrLV7psio2qcF2UzWqV5Zu1lKpZmJ3MZJGVssiyfJVXS1lWZdViq8s8K8o2E6sMe2Xskt1x81yYGEfclcWqzBZW1WhjOuZK6fACaVJIyafesGObt3o8RbHKrIkUv1DIkE3nYz4BcnONXYrzsYxiktMk-fRpZrVPUsdnoZv2JujFGOzuZQMw1I31svG9kAfmMP-9DcH_DxsS8pCYcxmkyP4fAAD___EIjHc">