<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/55545>55545</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[AArch64] miscompilation when optimizing shuffles involving concatenations with zero literals
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Benjins
</td>
</tr>
</table>
<pre>
The following C++ code is miscompiled on the AArch64 backend under -O1:
[Godbolt link](https://godbolt.org/z/7jfvo87Tz)
```
#include <arm_neon.h>
int8x8x2_t do_stuff(int8x8_t Input) {
int8x8_t Rev = vrev32_s8(Input);
int8x8_t Zero = {};
int8x8x2_t Output = vtrn_s8(Zero, Rev);
return Output;
}
```
here is the assembly it generates:
```
.LCPI0_0:
.byte 8 // 0x8
.byte 0 // 0x0
.byte 10 // 0xa
.byte 2 // 0x2
.byte 12 // 0xc
.byte 4 // 0x4
.byte 14 // 0xe
.byte 6 // 0x6
.LCPI0_1:
.byte 9 // 0x9
.byte 1 // 0x1
.byte 11 // 0xb
.byte 3 // 0x3
.byte 13 // 0xd
.byte 5 // 0x5
.byte 15 // 0xf
.byte 7 // 0x7
do_stuff(__Int8x8_t): // @do_stuff(__Int8x8_t)
rev32 v1.8b, v0.8b
adrp x8, .LCPI0_0
adrp x9, .LCPI0_1
ldr d0, [x8, :lo12:.LCPI0_0]
mov v1.d[1], v1.d[0]
ldr d2, [x9, :lo12:.LCPI0_1]
tbl v0.8b, { v1.16b }, v0.8b
tbl v1.8b, { v1.16b }, v2.8b
ret
```
If we call this function with the input:
```
{10, 11, 12, 13, 14, 15, 16, 17}
```
we'd expect to see the following output:
```
{0, 13, 0, 11, 0, 17, 0, 15, 0, 12, 0, 10, 0, 16, 0, 14}
```
and we do on -O0. However under -O1 we get this output:
```
{13, 13, 11, 11, 17, 17, 15, 15, 12, 12, 10, 10, 16, 16, 14, 14}
```
The problematic lines in the assembly are these:
```
mov v1.d[1], v1.d[0]
...
tbl v0.8b, { v1.16b }, v0.8b
tbl v1.8b, { v1.16b }, v2.8b
```
The upper 64-bits of v1 should be zero, but instead are a duplicate of the lower 64-bits. The output then has these instead of zero bytes.
This seems to involve an optimization introduced in 21a97a2ac11b81240c08753277dc1a234c6a7816. It detects that the latter part of the shuffle is all zero's, and uses the semantics of tbl for out of bounds indices (setting output to 0) to implicitly output 0. However in this case, afaict the indices are all in-bounds. They're all less than 16, and the tbl seems to use [0, 16) indices not [0, 8). So instead of an out of bounds index causing a 0 in the output, it reads a byte from the upper-half of the input and gives the wrong result. I'm not sure if the error is in this transform or something downstream, but reverting that commit seemed to avoid the issue.
I have verified that this still repros on the latest trunk (8527f32f0a16a77edb0b53a5ab8074e518eeff54).
For context: this code was generated by a fuzzer, it was not manually written.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyVV0tzozgQ_jX2RRUKhAH74EMeO7up2qrZ2t3TXlIChK0ZQC5J2El-_X4tMLYTO_akHCGQvn53q5Xr8m3571qySte13ql2xR4n_AE_VuhSMmVZo2yhm42qZcl0yxw239-bYp3OWC6Kn7ItWdeW0rC779Ekvp-ET5NwPyYPv-sy17VjtWp_TpKnCZ-vndtY2si_4bfq1wNtVnh7x3_2o9rqefYv5ouBTBoOv_6Vx6ot6g7iTeJHYZqXVuo2WE_i346Zq9bNX-ev_MWxUr9Y11UVuPdf8e253XQOLNgke9jDFuPq33IL4k9sa-Q25i92DugeMYnPAP6TRnsEkcuePu_xgnzvHGj0lJ1pe7oEnfBH4vmBupGuM-2AOiyA_Fm7-HEtjXcb-UlYK5u8fmPKsZVspRFO2o8-OiUS_Pn413P4Eo672PAX5G9O4jlnX__1bmXh6_w8PrwVH57HR1cIjHhxHs9v5c8v8L9CYMQX5_GzW_nPLvC_QmDEy_P49Fb-6UlARBcDYnErwcUFhW7FRxfwVwiM-Pw8Pr6Vf3yB_xUCI748j09u5Z9c4H-FwIivzuOzW_lnPf6omL68PA_1zxeu-wvQySy8CDoRyRdbPLdRMM-pJG5DmpzsEaXZ0BP1BRvGcnV-z-Joz4fwqUvjn2VIe3BS9QShRa0jjsdIOnk6RTZ665-QsgQu8sfa4_718_6RE99zWpzlFH1CurzuOYWDPXC4EJ8ozRmdAuctNKKiyyj-CYWz5otT5bliO8kKUdc4WHC6VF1bOIV2YKfc2p81yp-OX54tkCPy1o4iP3p7RLEfZ35M_Jj6Mfv6nNvJCc9KJl83snDMaWal9IIcOhk9nJtXZAoPYhxJFw5CjNPkMOWHaXiYpofp7GvZBXommLPU1FDdfQ8D9ofeyS16qLGTovWVdL21b1RkMGV8ZOLooMUwJkcjPxqP1BkckB455opC1EFujM5r2QinCur2pEVInHYhwngPWfm1JrfmVxAE_eTX0uTX0uOiut1mA0-ls7tcOXioAglm17qrS5ZL9j70dDm6PdVaJ0Xp1Res7Da1KtCIEYbMg2A9EAoY0e79TastWwvbG22kAxyRZ1S_bXAqF4IFedBYSgjVbnW9Bc-W6Y1TjXoXPmXRjxpddgX6eXiIR2KRCS6KKMrnEZ-FRTjPkphnWVlEgsezIhXZPEoD9oxGWjpkGwkkXC-8cA7Sb4Rxe33sGnW-9j0olYveEpkla1Dcd1b2zSmCQrQIFm88ckqlDWlOr7lGIlAAlarAdpwaVjp3yGlSL6TmndRsyKDKIcCGxaN88iEISQqBqCMJKqEKNxSsnrh3CwRV7V3P1vvgDTIPC7W0XuN2SAnSggiQzKOxoRbz0TlkzmKk32o3ruCIWQTsH33sS3LPR63lKyTuLCks0DEPiTSUAdBBP28Ah_A-ClhldOO3-Ki8W4u62rvD12Uv80ptB9PvjAZlI22Hmxd7hqqNF9N2dHfocdIY-EPZ0YTOiNbCRw3Dd6sbia-gUuoddIE0zT7eDVneO8uHCS6PDeQlSyHiYCqx1aq3oLK2kycR_Ix4R8iCgKoUbe8DjcLaKfjCSJQZu7-H1nSfwbrp2p8UJPOEZ1XMq1BECNpMlnmYJ7FIRD4Ps5lMormUVZXMyAvHXL9Bo0K3Tr5SkR0ihq6_OyTf_uKEvEYJw8n3jpAenEDrZDhEcodQeYNlFRKiHahPy2VcLuKFmDrEp1wiDIarM4rY4WLdp-WOsn1IVDLekEd2yGP6BBmpcrQeYPuz15eCWiELRW2nnamXHy7Y2NTlATjhpa63-8cdDPkDyYxX7wYk6LckSWbJdL2MQlhuJvMsz5IqTUsRp3nMk6jKo3Qep8W0FrmsLekz4byVu96TmEOvqVrykPMQ1g55BIpBmmYLyWUU5_FMlrxEP4jcV3VActDNf2qWXqS8W1ks1sqiEo6LOD_UqpXefERfdG6tzfJBtj-QRlPPeulF_x93Y4HZ">