<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/57319>57319</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
AArch32 vld1 copies to the stack and then loads the vector
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
fbarchard
</td>
</tr>
</table>
<pre>
vld1 is producing 5 instructions in AArch32
```
const uint16_t* weights,
const uint16x4_t vw = vld1_u16(weights); weights += 4;
```
generates this for the vld1 intrinsic
```
mov lr, sp
ldr r5, [r1]
ldr r6, [r1, #4]
add r1, r1, #8
stm sp, {r5, r6}
vld1.16 {d16}, [lr :64]
```
with clang (trunk). But gcc generates
vld1.32 {d18}, [ip]!
and clang generates an ld1 for AArch64.
Writing the code in assembly
```
1:
VLD1.16 {d3}, [r3]! // weights. <-----
VLD1.32 {d2}, [r1]! // input
SUBS r4, r4, #2
VMOVL.U16 q2, d3
VMLAL.U32 q0, d4, d2[0]
VMLAL.U32 q0, d5, d2[1]
BHI 1b
```
performance on Cortex A53
intrinsics 28.4 ns
assembly 20.7 ns
Here is a snippet on godbolt to reproduce:
https://godbolt.org/z/jvT4Trsbq
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJx9VEtvozAQ_jVwGRVhGwg5cEgaVV2pqz30scfKYDdxSzC1TdLur98xEJJN00U8PA9mvvlm7FKLz2JXCwLKQmu06CrVrCEF1Vhnusop3VgUYLEw1YbRIF4F8WJ8Z_F492KFng461TiSPbuALmAv1XrjbECvv3p8JM8OdnsI2Ap8_ueOZAHNp1_mAVseAkBAl94vQd3F1MN7LRtpuJMW3AaredEGFxKG6hpnsCRV_aeArd5BbRAt2PbUrRYGTOr1Qbo0JEhXJ_rsqPcLypLJzoWAQT0Z88Fi3dbn8LrZcgiNgWbjfx5wRDJvE6RXDylqg2wtsmOCM_x75TZQ1Rz7h0xi95o35DECWHYO1lUFE0EneRgd8uTHPKr1GSgBGOtoxBj2yDBvwNPqOe4nI0uiU8p-G-X8HHn-Ky2knyBurdyW9edF7AQrG1YwXk93q4kFdgRn2AHcyRXQG7wP44IVB-z6yl8XQo4F05OQ5PuQqmk792-U-8flvf-apO9bMraWnuX6-evpLnrEAt6p98Aazux3C7Qjmve4t_dxEFa6jKcGf-ucTs7ki_Py9kf_JeVFpltpsGtb3lQSdAPX2jj5AYt0xDdtFAs0jxJoxmE5dA9oHM0m7fC-lUb684ODbVTbSucDr7Uode3AaTByOFnk1OSNc631Us_y6Bpps0bpDz6vu4fkwdjy_TRNKAuSZYTmCWMsFAUTczbnoVOulsV4Pg27vdKt8seA7gfQOl69gZ9ilHBuNRd2OBlk5bQJO1MXZ4BwJ3VlVOktCnW9O3yusI5X_AlFZW2HG4nepDNG5uGmEHE5E-U8roRI8kySnLGKpSwh-ZznWTwLa17K2hbYs4DSRu6hD4Fr7GCoChpTGucUg6WUsEjyjJScSU5SGqfzlyCJ5ZarOvI4PFOhKXpIZbe2aKyVxcmfjNgutW6k7NNhfN65jTbFS8mRJm5E2CcvevB_AcCfqFw">