<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/57319>57319</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            AArch32 vld1 copies to the stack and then loads the vector
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          fbarchard
      </td>
    </tr>
</table>

<pre>
    vld1 is producing 5 instructions in AArch32

```
const uint16_t* weights,
const uint16x4_t vw = vld1_u16(weights); weights += 4;
```

generates this for the vld1 intrinsic

```
mov lr, sp

ldr r5, [r1]
ldr r6, [r1, #4]
add r1, r1, #8
stm sp, {r5, r6}
vld1.16 {d16}, [lr :64]
```
with clang (trunk).  But gcc generates
vld1.32 {d18}, [ip]!  
and clang generates an ld1 for AArch64.

Writing the code in assembly
```
1:
        VLD1.16 {d3}, [r3]!             // weights.  <-----
        VLD1.32 {d2}, [r1]!             // input
        SUBS    r4, r4, #2
        VMOVL.U16 q2, d3
        VMLAL.U32 q0, d4, d2[0]
        VMLAL.U32 q0, d5, d2[1]
        BHI     1b
```
performance on Cortex A53
intrinsics 28.4 ns
assembly 20.7 ns

Here is a snippet on godbolt to reproduce:
https://godbolt.org/z/jvT4Trsbq


</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJx9VEtvozAQ_jVwGRVhGwg5cEgaVV2pqz30scfKYDdxSzC1TdLur98xEJJN00U8PA9mvvlm7FKLz2JXCwLKQmu06CrVrCEF1Vhnusop3VgUYLEw1YbRIF4F8WJ8Z_F492KFng461TiSPbuALmAv1XrjbECvv3p8JM8OdnsI2Ap8_ueOZAHNp1_mAVseAkBAl94vQd3F1MN7LRtpuJMW3AaredEGFxKG6hpnsCRV_aeArd5BbRAt2PbUrRYGTOr1Qbo0JEhXJ_rsqPcLypLJzoWAQT0Z88Fi3dbn8LrZcgiNgWbjfx5wRDJvE6RXDylqg2wtsmOCM_x75TZQ1Rz7h0xi95o35DECWHYO1lUFE0EneRgd8uTHPKr1GSgBGOtoxBj2yDBvwNPqOe4nI0uiU8p-G-X8HHn-Ky2knyBurdyW9edF7AQrG1YwXk93q4kFdgRn2AHcyRXQG7wP44IVB-z6yl8XQo4F05OQ5PuQqmk792-U-8flvf-apO9bMraWnuX6-evpLnrEAt6p98Aazux3C7Qjmve4t_dxEFa6jKcGf-ucTs7ki_Py9kf_JeVFpltpsGtb3lQSdAPX2jj5AYt0xDdtFAs0jxJoxmE5dA9oHM0m7fC-lUb684ODbVTbSucDr7Uode3AaTByOFnk1OSNc631Us_y6Bpps0bpDz6vu4fkwdjy_TRNKAuSZYTmCWMsFAUTczbnoVOulsV4Pg27vdKt8seA7gfQOl69gZ9ilHBuNRd2OBlk5bQJO1MXZ4BwJ3VlVOktCnW9O3yusI5X_AlFZW2HG4nepDNG5uGmEHE5E-U8roRI8kySnLGKpSwh-ZznWTwLa17K2hbYs4DSRu6hD4Fr7GCoChpTGucUg6WUsEjyjJScSU5SGqfzlyCJ5ZarOvI4PFOhKXpIZbe2aKyVxcmfjNgutW6k7NNhfN65jTbFS8mRJm5E2CcvevB_AcCfqFw">