<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/90985>90985</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[X86] Worse runtime performance on Zen CPU when optimizing for Zen
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Systemcluster
</td>
</tr>
</table>
<pre>
The following code compiled with `-O3 -march=znver4` (or any other `znver`) runs around 25% slower on Zen hardware than when compiled with `-O3 -march=x86-64-v4` or the baseline `x86-64`.
```c
bool check_prime(int64_t n) {
if (n < 2) {
return true;
}
int64_t lim = (int64_t)ceil((double)n / 2.0);
for (int64_t i = 2; i < lim; i++) {
if (n % i == 0) {
return false;
}
}
return true;
}
```
<details>
<summary>Full code</summary>
```c
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <math.h>
#include <time.h>
bool check_prime(int64_t n) {
if (n < 2) {
return true;
}
int64_t lim = (int64_t)ceil((double)n / 2.0);
for (int64_t i = 2; i < lim; i++) {
if (n % i == 0) {
return false;
}
}
return true;
}
int main() {
clock_t now = clock();
int sum = 0;
for (int i = 0; i < 1000000; i++) {
if (check_prime(i)) {
sum += 1;
}
}
printf("%f, %d\n", (double)(clock() - now) / CLOCKS_PER_SEC, sum);
return 0;
}
```
</details>
Running on a Ryzen 7950X:
```cmd
> clang.exe -std=c11 -O3 -march=znver4 ./src/perf.c && ./a.exe
24.225000 seconds, 78501
> clang.exe -std=c11 -O3 -march=x86-64-v4 ./src/perf.c && ./a.exe
20.866000 seconds, 78501
> clang.exe -std=c11 -O3 ./src/perf.c && ./a.exe
20.819000 seconds, 78501
```
```cmd
> clang.exe --version
clang version 18.1.4
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\Program Files\LLVM\bin
```
Disassembly here: https://godbolt.org/z/orssnKP74
I originally noticed the issue with Rust: https://godbolt.org/z/Kh1v3G74K
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzsVk1v2zgT_jXjy8ACRX1YOvjQyNGLIn3RIO3uFr0ElERb3FKkQVJxnF-_IOXGcep8LHaPKwiyyYecZ56ZATnMWrFRnC8hu4BsNWOj67VZftlbx4dWjtZxM2t0t19-7TmutZR6J9QGW91xbPWwFZJ3uBOuR8jJ_HOC84GZtodk9aDuuEkhJwi00AaZ2qN2PTd-ZQAhJ0BLNKOyyIweVYc0A5qhlXrHDWqF37nCnpluxwxH1zOFu56rN5jvi3yep_O7QK4Nup5jwyyXQnG_eMIhJxGQFZAPh29Opredxo3WEtuetz9ut0YMHGghlMvTW4fKuw2Li2khIqJYe5UKIamQ_oL6x3A3GoXOjBySJxgsVk_MHAikGBCSFR45gZYtFxJoAbTo9NhIDrRUCLRGGvk4nlhda_NkM4pgjUJyEf5WniAMgF6E94zDj5JoNu33JsjZpU_0rZm0pwJPRZ7IPReTxwWP-ThJUlJ13DEhLSSXj1N2HAZm9pBc1qOUoTghqYDWR-C1TANNhGrl2HEfG-s6n_qoP-56jgvlXoX1i-jAXP8i6MTAn4L_ahn-V39nFb9eguErlMOBCRW0P2NvpW5_-FToXZAYxtPCEye8DTtOQSUvROoQJXKMUkzC875IPasR78FLsQqe0AvPFv-NWG2NUG4d1FGg2Rpo5bPTQVapMOWHx-LwPh3DgXMfpOASrbH69Lm6-nJ7fXlz--Wy8jvtODwP2iE35P1nA9D6l-MhfG9Gpfy9pRUyvNk_cIWLMiPfIPlw_mQYup9GL7GVTG0ifs9xbl0HyaqNYzxz12HkjxzTAq233KyjFoHmQPMwz7yBySZNI0ozQgha3mrVWa9_UWQkPpXzHubHu-7d5CQq8vwfkb_JhL88R-64fIX7bFrfysn8jhsrtJrAAOBhCuMiiqN0Qr4ys-EOkg94X-S3eTrftvOdUJ3e2flg7w6XwdfecNbhoDsu_dqttuJ-gj4q65iUvFsJ46HKV09WXRu9MWzAWkhuIas-ffr9_5BVjVCvqFoJy6zlQyP32HPDvb3eua31NmkNtN7ortHSRdpsgNYPQGttrFVX14v0qaGPqI3YCMWk3KPSTrS8Cx2PsHbkU4N0M1r3HoKrPr5L_rdIr2bdMunKpGQzvowXcbrI8izOZ_2SLEqaF-uSNR0rkyJLkpSwNeNxkXQNWTQzsaSEpiQjSbyIsziLUrJI6Doty5Qk3ZomkBI-MCEjKe8Gzz0Lji5LUhbZTLKGSxu6UUoV300q_OGSrWZm6ffMm3FjISVSWGePVpxwMrSx34ocshX-oY3lvrf0tyr6QtVmYKrlP9vK6vq3qZnUWycG8eCPB38Yf-dqNhq5fBYs4fqxiVo9AK096eFnvjX6T946oHVw1QKtg5S_AgAA__8RkBWQ">