<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/64306>64306</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            x64 Codegen worse than MSVC for vectorized loop
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          rainerzufalldererste
      </td>
    </tr>
</table>

<pre>
    Haven't had the chance to investigate too much, but this is the loop: https://godbolt.org/z/e3dMW5fbe (from here https://github.com/rainerzufalldererste/simd_dct/blob/cstiller/avx_dct/src/simd_dct.cpp#L2062)

7680x7680 file, 1024 runs (AVX2 variant)

Zen4:
| compiler | avx (std-dev) | avg (std-dev) |
| :- | - | - |
msvc    |    0.32 clk/byte (   0.31 ~ 0.32) | 13483.36 MiB/s (13345.56 ~ 13624.05) |
clang++-16 |    0.37 clk/byte (   0.36 ~    0.38) | 11602.44 MiB/s (11401.21 ~ 11810.91)
g++-11.3  |    0.36 clk/byte (   0.35 ~    0.37) | 11828.99 MiB/s (11575.90 ~ 12093.38) |


Skylake-Client:
| compiler | avx (std-dev) | avg (std-dev) |
| :- | - | - |
msvc    |    0.63 clk/byte ( 0.61 ~    0.65) | 4828.56 MiB/s ( 4700.72 ~  4963.55) |
clang++-16  | 0.64 clk/byte (   0.61 ~    0.67) |  4743.21 MiB/s ( 4550.35 ~  4953.13) |
g++-11.3   |    0.64 clk/byte (   0.61 ~    0.67) | 4737.80 MiB/s ( 4534.87 ~  4959.73) |

Very surprising to see MSVC doing well, here. I'll try to investigate further, to pin down which part of the generated ASM is particularly detrimental, but wanted to document this somewhere for now. The performance of the AVX-512 variant is significantly better on clang than both GCC & MSVC, which is why I was so surprised by this result.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzEVU2PIjcQ_TXmUsJy2-4PHzgwrEhWypw2mqxyidztgnbGtJHthmEO-9sjNwMDk9Eqt1wMrg-_V1XwSsdotwPigpQPpPwy02PqfVgEbQcMr-NGO2cwYIgJZ603p8Wv-oAD4XWCXhtIPULX66FDSB7scMCY7FanfPWwG7ue8BW0Y4LU2wg2ThnO-z0RS-hT2kciloSvCV9vvWm9S9SHLeHrV8LXKMzjH-WmRSC82QS_gx4DfkyzqR9b2vkd4evPeBO-jnZn_jJdInzdOt8Svu5iss5hIHytDy9vvhi6m2Da7feEi984qzjhirAvhC3PZ1017CUfsLEOc4kF4xLCOMRMdfn0ncNBB6uH9CHzTxxkpn421Svo_G5vHQbIF314yfkxmbnBA-Hqzbr9t_X9BSKW8ynu5jx7d_HQAcBkBgBGBYfOPecunNLU1LO1gB-T8wJYCNkIKip4tA-5ITmyEEKWtKzgBxSi4pKy8o5J5_SwJfyB8Id5Ud1A1p9C5nfO35oralExTqW8Qy0kKyjPBIuiKRhVxbWhV7SCitsaq08Byytg_Q7Y8IYqdQ9Y1iVVLANypsQ7v9spns9vzyenn3G-chaH9P9NtRIfK2a0Ki71Vpc5gczllndTBVkzRmueg6WqBC1_OtXpGUYr-VmLbyCvLQZZS5Hnd4dZlpeBSFUKWog7zPu53pT5n1FlLWrasA-gQtKmfgNVtBafzPUJwwniGPbBRjtss6ZFRHj89rQC47PliM7l_3sWIgpfCa-dgxROH-VvM4bUZ3lZZc_eDmD8cYBjb7se9jok8JtJC7c4YNAJDSy_PWaBzE7bjU4HdwKDKdgdDkm7i5Ae9ZCjkwfjuzH7ztoa_Q6Pkz5ufIDBHyn83iPsMWx82E0S_Qa5fPo-L4urQmXQvATsxnZ6SO4ELaaEAfwA0_wh9XqA1qceflmtgPBq6kgmdK7HRjj2J_gKR515XDqIBtrTmVzAOLpEZ2YhjBJKz3BRVIo1iislZ_0ClVKNkUYKiUox1XalVpzVsmw29UbVM7vgjAvWsKLgXJWSSiWNKmXZFHXZFbwikuFOW0edO-zyDpnZGEdcVFKwauZ0iy5ellxY5KB5O24jkczZmOJ7WrLJ4eKlkrDyBrc4wNGHiOcmTL-E3N8DdskH-4pm2mazMbjFT_ZSfv3tY74P_m-c9s3EMBK-nkj-EwAA__9VBCO9">