<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/80688>80688</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [X86] Improper optimization of switching operations
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          24bit-xjkp
      </td>
    </tr>
</table>

<pre>
    I write a function to swap two arrays:
```c++
// test.cpp
void f(int (&a)[1000], int(&b)[1000]) noexcept
{
    for (auto i{0zu}; i<1000; i++)
    {
        auto temp{a[i]};
        a[i] = b[i];
        b[i] = temp;
 }
}
```
then I compile it with `clang++ test.cpp -std=c++23 -Ofast -march=znver4` and clang generates so many moves:
```asm
......
vmovups %zmm2, (%rdi)
vmovups %zmm3, 64(%rdi)
vmovups %zmm0, (%rsi)
vmovups %zmm1, 64(%rsi)
vmovups 128(%rdi), %zmm0
vmovups 192(%rdi), %zmm1
vmovups 128(%rsi), %zmm2
vmovups 192(%rsi), %zmm3
......
```
You can view https://gcc.godbolt.org/z/hjvcvP1q5 for more details.

In "Software Optimization Guide for the AMD Zen4 Microarchitecture", AMD mentions that placing more than three loads (two if floating point) consecutively in the code stream may reduce dispatch bandwidth in some cases in Chapter 2.6.2.1 Bank Conflicts. So we shouldn't put so many load and store together. GCC generates only two floating-point load in each loop.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyEVcGO4jgQ_RpzKRE5TgLhwGEAMerDaFaay-7eHLsg7knsrF0hQ3_9yg7d07Dd2igC7Hr1ylWvysgQzNkiblm1Y9VhIUdqnd-KsjG0_PX8c1g0Tl-3TzB5QwgSTqNVZJwFchAmOQBNDqT38hpY8YXxA-Nf2IrPr2JiF995VxyZOAJhoEwNw7x5cUbDiYnaWAImaiZWkokNq3Y555xVByb2YCzNpubBtAHr8JfCgW4h1rdYAAAn5yOjHMmBYesdfxnZ-sCKHRhW7BNH-j0fUWx-e97RxCdxEPYDW-8kq3YmRk9cD7ibDVhxgOYV-Ihq3qMS6xsikt4yOTzUcl5SixaeQLl-MB2CIZgMtRBr3Ul7npN5qzEsA2lWHG46iAKW308yECx76VXLisOLvaAv2YqDtBoSB5zRopeEAYKDXtor9O6CH8grQz_vZOm5Kdq7yzgEYKJ66XsR9UviVV6btyrfg4oIWpX_B-PvuMJnoPyO6wNYHi3vAiXOmf4etxEf4_LP-MI9TnzG94Ar_lvDB9H_ciMoaeFicIKWaEhapHE6K5WdnW5cR5nzZyaOL0wc2-eLuvyR_1OlIeidR9BI0nThNUD6fLLAhPjhTjRJj_B9INObF5nG--toNCZ3ahG-fDvA32hL-GaUd7F3DKGi0SMTSeII6NFG1wDUSoKhk8rY8xydWmmBWo8InZM6ilXHi8Oc4NQ5SRE4uDTnG1DOBlQjmQt2VzA2nUA5jRDIo-yhl1fwqEeFoE0YJKkWGmn1ZDS10SG4HkHJgCGu9q0cCD2IbJWJLIedtD9h7-ypM4pCBj8cTAihdWOnLRNrgmGkt96P503DEShl4s5ILfoMvu7370bF2e6a7sLXfJYpn9ndWECpWuicG7KF3hZ6U2zkArf5mq_zssz5atFuK6HrWusaa4VNXa9lXq9VXoiGq42WKBdmK7goueBV9ClXWSGkznNel7zkerPasJJjL02Xdd2lj_2wMCGMuK35qq4XnWywC-mmF8LiBMkYBawOC7-NPstmPAdW8s4ECr9ZyFCX_iL-rFfx4nrqB-8G9ODet4w7QZgMqTaqGc1pOyxG320futZQOzaZcj0Txxjl9rUcvHtGRUwc09kCE8d09n8DAAD__7mN-74">