<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/82198>82198</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Incorrect codegen for Zen4, possibly related to zmm register allocation
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Eisenwave
</td>
</tr>
</table>
<pre>
https://godbolt.org/z/8q7v4488z
This isn't exactly a minimal example but:
```cpp
using u64 = unsigned long long;
using u32 = unsigned;
using big = _BitInt(4096);
u32 rem_fast(big x, u32 y) {
//
constexpr int size = sizeof(x) / 4;
u32 digits[size];
__builtin_memcpy(digits, &x, sizeof(digits));
u32 rem = 0;
for (int i = 0; i < size; ++i) {
u64 temp = u64(rem) << 32 | digits[size - i - 1];
rem = temp % y;
}
return rem;
}
u32 rem_slow(big x, u32 y) {
return x % y;
}
int main() {
const u32 divisor = 77777;
big random = -1;
random *= 12345;
random += 1234567;
random *= 23894238392;
random += 333333333333333;
if (rem_slow(random, divisor) != rem_fast(random, divisor)) {
__builtin_trap();
}
}
```
`rem_fast` and `rem_slow` are equivalent and this program is not meant to crash.
It runs just fine with `-march=znver3`, but `-march=znver4` crashes (because of `__builtin_trap()`).
There's clearly something wrong here and I suspect it's related to `zmm` register allocation, since that's the striking difference between Zen 3 and Zen 4 codegen here.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyEVU1z6joP_jVmo4FJlJCPBYtSXma6f1dn0zGJAj43sXNsBwq__o4cPhrKnJvpUMd69Eh65MjSObXXRCuxXIvlZiYHfzB29T_lSJ_kkWY7U59XB-97J5I3gVuB272pd6b1C2P3ArcXgdviT35M06K4iGgjorfx9_8H5UA5LTD3QF-y8u0ZJHRKq062vNP1LcFu8Mw8OmbR-Ff1_bgzOKX3MGQpiGQDgw7Z1tAavQ8_IllPgAlOgE_mndoH8-da-Q_tBRZpVGYCyzvuik4QLHWfjXQMYrcvge-B_iywBJFf8QAAoyrw2KiMdp6-egtKe3DqQiEqL0wjsPgKFLiF9B6X3Zi9VnvlnViuGSyWmwng83M3qNYr_dlRV_VngcUVj-8gMAs53qPcTOX3-m5xLHUhp2hiaYwFgQVnre7msBx5-U3gWuBa_VAhMGcpeOr6sQdZKrCw1AVo8s4k3J78fVolzEHBHOLnavm5pTmS4hLOE4jIN48XS36wml0e3bzZp211rTn9d1uvfF9PcZ84WatOKi2w-MEQzsG1rUflWNxkAzk_kzI4ESt1bcZi5_HEerPgGxtjTNLla_v6bs_yvzBgUpQpJkVS4l94kunz9IEwXjUw9vem50jAgl7LHY95zHTfvqZXsJen6XHcvZX9KPDr9j8WtwFyf70HziKQuobrVkiZtywB_RnUUbakfUB4nlu9NXsrO1AOtPHQkdQevIHKSndYfJfiw4MdtIPfg_PQKE1wUv7AceadtNVBJJuLPpJNOCt853n305hyLoGbHKu6o0oOjsA0jH0pBLOVi-nAJUsCcwdVS9K2Z3CmI3_gyXeyPDEZEGr8ADe4nioPygcPS630VHOJIosuXccJWdor58mCbFtTSa-MHieMrgj8QY6u_kDgvFX_cJxaNQ1ZYsCO_IlIwy_SkISovEqhMjXtSYdkFrN6ldRlUsoZreI8KpKoSLJodlilOUZFXOZ1jGW0bOqiqpZxksRZluWYR8uZWmGEaYRxGWGURdGCsKmTivJlgzmmTSzSiDqp2kXbHju-rGbKuYFWBcZlMWvljloXrj1ETScIRoHIt6Bdsc98N-ydSKNWOe8eLF75llYfujLWsoK3enh4_iKdskS9cU7t2vN3XS9d90rR2WDb5wtW-cOwW1T8lWw57vXfvLfmN1Ve4DZk6_jq5Wr-DQAA__8-ID4d">