<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/58584>58584</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Suboptimal codegen for vzip1q of values that were loaded as float32x2 in AArch64
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          tellowkrinkle
      </td>
    </tr>
</table>

<pre>
    Godbolt link: https://gcc.godbolt.org/z/vqTna3cGv

The following C code:
```c
float32x4_t zip(const float* a, const float* b) {
        float32x4_t va = vcombine_f32(vld1_f32(a), vdup_n_f32(0.0f));
        float32x4_t vb = vcombine_f32(vld1_f32(b), vdup_n_f32(0.0f));
        return vzip1q_f32(va, vb);
}
```
compiles to this:
```asm
ldr     d0, [x0]
ldr     d1, [x1]
zip2    v2.2s, v0.2s, v1.2s
zip1    v0.2s, v0.2s, v1.2s
mov     v0.d[1], v2.d[0]
ret
```
instead of
```asm
ldr     d0, [x0]
ldr     d1, [x1]
zip1    v0.4s, v0.4s, v1.4s
ret
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJy1VN1ynCAUfhq8Yeog4O564cWmO8kDNPc7IKg0rBhAk-bpe9C1adKm7U0d1PM33_nO4YB06lt955R0NmJrhgfEjriPcQwgIHoLq2uavFsjcuc7sLzAOz_eD4I1dzMiJ0SO6_e-17h11ronM3T4M26c0glnDdmRdTWr3lonIqPP_BzxixkRPTRuCBEvdkSPWCCaIN7YJKIVRvubLWv1M8osMGInPDfuIs2gzy2jgDpbVVxFQKwS6Kym8TxcjSQn7WKvEPsAV_4FV_47rtdx8gOeoeLicUNaKp3lm9D96V3bVhU4jMbqgKPDsTfh1_aKcFktVnmcHkUSPCpvngkqT-98xeYrfviAGk2-meY0LMzIJhRJ2IKKJYj8IejiZnwNUpBjSZEi6KK-soGm_LZYA3uvhcKu_Q81bvT5Rp9v9Hn4iFam62IHz57ueZGpmqmKVSKLJlpdf5mkG6O5CLsMfqcHOAz-utVQA8ynndLO9SLiJ-01hhlTWmER8DZuFJsBH4--6Xc8m7yt351FE_tJ5jAEoFg7b79Po3dfdQNH5NaEAElAKA_lgWd9rUtelIRxpakWWpRCtaxtaVvtKZMlLzMrpLahhgZBezJTU0JpQWhJOKtIkZfV7tBIqLOpdkyKCnGiL8LYPCVON0Lm64WDnLoATmtCDK9OEYLpBq03fDHF3vk66nRNPHi4cqzOFtL1wvg714ZL8Q">