<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/91370>91370</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [X86] Worse runtime performance on Zen 4 CPU when optimizing for `znver4` or `skylake`
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          Systemcluster
      </td>
    </tr>
</table>

<pre>
    The following code runs around 300% slower on Zen 4 when optimized for `znver4` or `skylake` than when optimized for `znver3` or other targets.

```rust
pub fn sum(a: &[i64]) -> i64 {
    let mut sum = 0;
    a.chunks_exact(8).for_each(|x| {
 for i in x {
            sum += i;
        }
    });
 sum
}
```

<details>
<summary>Full code</summary>

```rust
pub fn sum(a: &[i64]) -> i64 {
    let mut sum = 0;
    a.chunks_exact(8).for_each(|x| {
        for i in x {
            sum += i;
        }
    });
    sum
}

fn main() {
    let nums = std::hint::black_box(generate());
    let now = std::time::Instant::now();
    let sum = sum(&nums);
 println!("{:?} / {}", now.elapsed(), sum);
}

fn generate() -> Vec<i64> {
    let mut v = Vec::new();
    for i in 0..1000000000 {
 v.push(i);
    }
 v
}
```

</details>

Running on a Ryzen 7950X:

```cmd
> rustc.exe -Ctarget-cpu=x86-64-v4 -Copt-level=3 .\src\main.rs && ./main.exe
138.7342ms / 499999999500000000

> rustc.exe -Ctarget-cpu=x86-64-v3 -Copt-level=3 .\src\main.rs && ./main.exe
136.2689ms / 499999999500000000

> rustc.exe -Ctarget-cpu=x86-64 -Copt-level=3 .\src\main.rs && ./main.exe 
136.0648ms / 499999999500000000

> rustc.exe -Ctarget-cpu=znver4 -Copt-level=3 .\src\main.rs && ./main.exe   
543.1562ms / 499999999500000000

> rustc.exe -Ctarget-cpu=znver3 -Copt-level=3 .\src\main.rs && ./main.exe   
137.4426ms / 499999999500000000

> rustc.exe -Ctarget-cpu=skylake -Copt-level=3 .\src\main.rs && ./main.exe
588.4743ms / 499999999500000000

> rustc.exe -Ctarget-cpu=haswell -Copt-level=3 .\src\main.rs && ./main.exe
138.5313ms / 499999999500000000
```

Disassembly here: https://godbolt.org/z/fzaGhGdWW

The tested optimization targets all generate different assembly with different levels of unrolling, but the `znver4` and `skylake` targets seem to be outliers.

I don't know whether the `skylake` target has the same issue or whether it's just caused by optimization target / CPU mismatch, but both result in the long list of constant values and show similar runtime performance. I also didn't test other targets than the above listed.

Split from https://github.com/llvm/llvm-project/issues/90985#issuecomment-2096057259
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzUV01v2zgT_jX0ZWBBIvV58CGxo6K3F-2722IvBSWNLTYUafDDH_n1C0p2HCfBtqi7hyUCxaLImed5ZkgOubVioxAXJLsn2WrGveu1WXw-WodDK711aGaN7o6L__cIay2l3gu1gVZ3CMYrC9xorzpgcUxoBlbqPRrQCv5CBSnse1Sgt04M4gk7WGsDJI-f1A5NSvIYpnf7eJT8EUOH67n6p1nsNEu7Hg04bjbobETiFYnvTs88nv6Mt27q2voG1gqsHwgtOWF3QGhOsnuRpyRbEVrBnLAHEHkKpLif5gAASHQweBcmAmEriAl78ZVHbe_Vo_2GB946QsuS0Cpaa_MNedsTWpJieSDF8oXNwESAUHC4dnRuoyN6H3yJK1-hkWJ16QgvtLqMCdwm-udRzzpcicOWHToupCXs4bnL-mHg5kjYQ-2lHINL2JLQ-vLhv6Lwqf17Qk-TX2s9PtcKBi5UwEWrtzyVH-xI0rqOsDvC7nqh3PSrkbx9_NboA6HlBhUa7nCy89r5aEnvrw05MeD066Oyjp-tKr0_GXlt4az3FDBC8wDuauDWCOWkIjQZB9BAJxitSbECQuuRX9CGEroMiCKUfGuxO8NeTsYvJt-qdc10SpE_sSVsGfKGPbyfK7sR-ThuJInvkXxOgDiKkvjcXhjcRVtvQw6J11Mv4d_95JIitH6zqsbnJ69U2C61Ag6fjk-ooKiy-GtA_t6CaofubPQBwupqIzwgzJfTRjdvt56w1aHM53k636UwX-qtm0vcoSRsxSAi2dKalmTLkIiRseMypDlEhNZjFx5w8pCwMipYSkNO0hrS6tSys1bXHH8CDrsNTh7RvKx-F5xfwwIXMHGelreDmU66XwQDJzhZyqIky39DqKYj9EY4CSuiNKX57XBO5_4teZOVZZQWKbsdTM_tHqW8dU1lLPkhmPe2kZWw3FocGnmEHk3YzqF3bmvDXkFrQuuN7hotXaTNhtD6idB6_cQ_9B-6L19eGgqFmkPrsDuXUNwJrc61EnApnzde6MR6jQaVg2fne-H6F_2jEBb0GrwyWkqhNmFvb7wD1-N1NcdV97qcOzm1iAM4DQ2C9k4KNNdF20fotCK0cPAYDrd9j1N9N3l4YxB6bsePlg8IwlqPoSg8TxOO0MLCd28dtNxb7KA5vqfGGKbl__6AQdiBu1BTTNQa7XowaL104RAJrqRWG5DCuqBFq6dzFnZcerQjc9vrPVgxCMlNqI3DoQxbNGttBq5ajOAjcGk1dKKbuIYwXVeyU_0b3PFG73D0h92VVJ-3UjhYGz28zg_het9ErR4IraXcnf_Nt0Z_x1A_1aNSltC6iqsyI5SNHa0eBlRuTuMqj7OCZtWsW7CuYhWf4SIpkiyJs7xIZv0i4WlaZHmRdUWSJ3nbdkWRpjRL1nnHG2xnYkFjmsZZXCRlktIqSopkncZpmbe8S7KEkjTGgQsZBWghl2cjhkWVsCKeSd6gtONthFKF-ym2ocjIVjOzGOk0fmNJGgdp7MWKE06O15ivZU6yFXzRxuJ7YbhcTkLgX141wkn9wxvKzBu5uEH4QPPvAAAA__8BQ9iy">