<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/55545>55545</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [AArch64] miscompilation when optimizing shuffles involving concatenations with zero literals
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          Benjins
      </td>
    </tr>
</table>

<pre>
    The following C++ code is miscompiled on the AArch64 backend under -O1:

[Godbolt link](https://godbolt.org/z/7jfvo87Tz)
```
#include <arm_neon.h>

int8x8x2_t do_stuff(int8x8_t Input) {
        int8x8_t Rev = vrev32_s8(Input);
        int8x8_t Zero = {};
        int8x8x2_t Output = vtrn_s8(Zero, Rev);
        return Output;
}
```

here is the assembly it generates:

```
.LCPI0_0:
        .byte   8                               // 0x8
        .byte   0                               // 0x0
        .byte   10                              // 0xa
        .byte   2                               // 0x2
        .byte   12                              // 0xc
        .byte   4                               // 0x4
        .byte   14                              // 0xe
        .byte   6                               // 0x6
.LCPI0_1:
        .byte   9                               // 0x9
        .byte   1                               // 0x1
        .byte   11                              // 0xb
        .byte   3                               // 0x3
        .byte   13                              // 0xd
        .byte   5                               // 0x5
        .byte   15                              // 0xf
        .byte   7                               // 0x7
do_stuff(__Int8x8_t):                // @do_stuff(__Int8x8_t)
        rev32   v1.8b, v0.8b
        adrp    x8, .LCPI0_0
        adrp    x9, .LCPI0_1
        ldr     d0, [x8, :lo12:.LCPI0_0]
        mov     v1.d[1], v1.d[0]
        ldr     d2, [x9, :lo12:.LCPI0_1]
        tbl     v0.8b, { v1.16b }, v0.8b
        tbl     v1.8b, { v1.16b }, v2.8b
        ret
```

If we call this function with the input:

```
{10, 11, 12, 13, 14, 15, 16, 17}
```

we'd expect to see the following output:

```
{0, 13, 0, 11, 0, 17, 0, 15, 0, 12, 0, 10, 0, 16, 0, 14}
```

and we do on -O0. However under -O1 we get this output:

```
{13, 13, 11, 11, 17, 17, 15, 15, 12, 12, 10, 10, 16, 16, 14, 14}
```

The problematic lines in the assembly are these:

```
mov     v1.d[1], v1.d[0]
...
tbl     v0.8b, { v1.16b }, v0.8b
tbl     v1.8b, { v1.16b }, v2.8b
```

The upper 64-bits of v1 should be zero, but instead are a duplicate of the lower 64-bits. The output then has these instead of zero bytes.

This seems to involve an optimization introduced in 21a97a2ac11b81240c08753277dc1a234c6a7816. It detects that the latter part of the shuffle is all zero's, and uses the semantics of tbl for out of bounds indices (setting output to 0) to implicitly output 0. However in this case, afaict the indices are all in-bounds. They're all less than 16, and the tbl seems to use [0, 16) indices not [0, 8). So instead of an out of bounds index causing a 0 in the output, it reads a byte from the upper-half of the input and gives the wrong result. I'm not sure if the error is in this transform or something downstream, but reverting that commit seemed to avoid the issue.

I have verified that this still repros on the latest trunk (8527f32f0a16a77edb0b53a5ab8074e518eeff54).

For context: this code was generated by a fuzzer, it was not manually written.

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyVV0tzozgQ_jX2RRUKhAH74EMeO7up2qrZ2t3TXlIChK0ZQC5J2El-_X4tMLYTO_akHCGQvn53q5Xr8m3571qySte13ql2xR4n_AE_VuhSMmVZo2yhm42qZcl0yxw239-bYp3OWC6Kn7ItWdeW0rC779Ekvp-ET5NwPyYPv-sy17VjtWp_TpKnCZ-vndtY2si_4bfq1wNtVnh7x3_2o9rqefYv5ouBTBoOv_6Vx6ot6g7iTeJHYZqXVuo2WE_i346Zq9bNX-ev_MWxUr9Y11UVuPdf8e253XQOLNgke9jDFuPq33IL4k9sa-Q25i92DugeMYnPAP6TRnsEkcuePu_xgnzvHGj0lJ1pe7oEnfBH4vmBupGuM-2AOiyA_Fm7-HEtjXcb-UlYK5u8fmPKsZVspRFO2o8-OiUS_Pn413P4Eo672PAX5G9O4jlnX__1bmXh6_w8PrwVH57HR1cIjHhxHs9v5c8v8L9CYMQX5_GzW_nPLvC_QmDEy_P49Fb-6UlARBcDYnErwcUFhW7FRxfwVwiM-Pw8Pr6Vf3yB_xUCI748j09u5Z9c4H-FwIivzuOzW_lnPf6omL68PA_1zxeu-wvQySy8CDoRyRdbPLdRMM-pJG5DmpzsEaXZ0BP1BRvGcnV-z-Joz4fwqUvjn2VIe3BS9QShRa0jjsdIOnk6RTZ665-QsgQu8sfa4_718_6RE99zWpzlFH1CurzuOYWDPXC4EJ8ozRmdAuctNKKiyyj-CYWz5otT5bliO8kKUdc4WHC6VF1bOIV2YKfc2p81yp-OX54tkCPy1o4iP3p7RLEfZ35M_Jj6Mfv6nNvJCc9KJl83snDMaWal9IIcOhk9nJtXZAoPYhxJFw5CjNPkMOWHaXiYpofp7GvZBXommLPU1FDdfQ8D9ofeyS16qLGTovWVdL21b1RkMGV8ZOLooMUwJkcjPxqP1BkckB455opC1EFujM5r2QinCur2pEVInHYhwngPWfm1JrfmVxAE_eTX0uTX0uOiut1mA0-ls7tcOXioAglm17qrS5ZL9j70dDm6PdVaJ0Xp1Res7Da1KtCIEYbMg2A9EAoY0e79TastWwvbG22kAxyRZ1S_bXAqF4IFedBYSgjVbnW9Bc-W6Y1TjXoXPmXRjxpddgX6eXiIR2KRCS6KKMrnEZ-FRTjPkphnWVlEgsezIhXZPEoD9oxGWjpkGwkkXC-8cA7Sb4Rxe33sGnW-9j0olYveEpkla1Dcd1b2zSmCQrQIFm88ckqlDWlOr7lGIlAAlarAdpwaVjp3yGlSL6TmndRsyKDKIcCGxaN88iEISQqBqCMJKqEKNxSsnrh3CwRV7V3P1vvgDTIPC7W0XuN2SAnSggiQzKOxoRbz0TlkzmKk32o3ruCIWQTsH33sS3LPR63lKyTuLCks0DEPiTSUAdBBP28Ah_A-ClhldOO3-Ki8W4u62rvD12Uv80ptB9PvjAZlI22Hmxd7hqqNF9N2dHfocdIY-EPZ0YTOiNbCRw3Dd6sbia-gUuoddIE0zT7eDVneO8uHCS6PDeQlSyHiYCqx1aq3oLK2kycR_Ix4R8iCgKoUbe8DjcLaKfjCSJQZu7-H1nSfwbrp2p8UJPOEZ1XMq1BECNpMlnmYJ7FIRD4Ps5lMormUVZXMyAvHXL9Bo0K3Tr5SkR0ihq6_OyTf_uKEvEYJw8n3jpAenEDrZDhEcodQeYNlFRKiHahPy2VcLuKFmDrEp1wiDIarM4rY4WLdp-WOsn1IVDLekEd2yGP6BBmpcrQeYPuz15eCWiELRW2nnamXHy7Y2NTlATjhpa63-8cdDPkDyYxX7wYk6LckSWbJdL2MQlhuJvMsz5IqTUsRp3nMk6jKo3Qep8W0FrmsLekz4byVu96TmEOvqVrykPMQ1g55BIpBmmYLyWUU5_FMlrxEP4jcV3VActDNf2qWXqS8W1ks1sqiEo6LOD_UqpXefERfdG6tzfJBtj-QRlPPeulF_x93Y4HZ">