<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/58584>58584</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Suboptimal codegen for vzip1q of values that were loaded as float32x2 in AArch64
</td>
</tr>
<tr>
<th>Labels</th>
<td>
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
tellowkrinkle
</td>
</tr>
</table>
<pre>
Godbolt link: https://gcc.godbolt.org/z/vqTna3cGv
The following C code:
```c
float32x4_t zip(const float* a, const float* b) {
float32x4_t va = vcombine_f32(vld1_f32(a), vdup_n_f32(0.0f));
float32x4_t vb = vcombine_f32(vld1_f32(b), vdup_n_f32(0.0f));
return vzip1q_f32(va, vb);
}
```
compiles to this:
```asm
ldr d0, [x0]
ldr d1, [x1]
zip2 v2.2s, v0.2s, v1.2s
zip1 v0.2s, v0.2s, v1.2s
mov v0.d[1], v2.d[0]
ret
```
instead of
```asm
ldr d0, [x0]
ldr d1, [x1]
zip1 v0.4s, v0.4s, v1.4s
ret
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy1VN1ynCAUfhq8Yeog4O564cWmO8kDNPc7IKg0rBhAk-bpe9C1adKm7U0d1PM33_nO4YB06lt955R0NmJrhgfEjriPcQwgIHoLq2uavFsjcuc7sLzAOz_eD4I1dzMiJ0SO6_e-17h11ronM3T4M26c0glnDdmRdTWr3lonIqPP_BzxixkRPTRuCBEvdkSPWCCaIN7YJKIVRvubLWv1M8osMGInPDfuIs2gzy2jgDpbVVxFQKwS6Kym8TxcjSQn7WKvEPsAV_4FV_47rtdx8gOeoeLicUNaKp3lm9D96V3bVhU4jMbqgKPDsTfh1_aKcFktVnmcHkUSPCpvngkqT-98xeYrfviAGk2-meY0LMzIJhRJ2IKKJYj8IejiZnwNUpBjSZEi6KK-soGm_LZYA3uvhcKu_Q81bvT5Rp9v9Hn4iFam62IHz57ueZGpmqmKVSKLJlpdf5mkG6O5CLsMfqcHOAz-utVQA8ynndLO9SLiJ-01hhlTWmER8DZuFJsBH4--6Xc8m7yt351FE_tJ5jAEoFg7b79Po3dfdQNH5NaEAElAKA_lgWd9rUtelIRxpakWWpRCtaxtaVvtKZMlLzMrpLahhgZBezJTU0JpQWhJOKtIkZfV7tBIqLOpdkyKCnGiL8LYPCVON0Lm64WDnLoATmtCDK9OEYLpBq03fDHF3vk66nRNPHi4cqzOFtL1wvg714ZL8Q">