<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/96788>96788</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Clang does not optimize memcpy-like scenario for pipelining
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            clang
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          Hattonuri
      </td>
    </tr>
</table>

<pre>
    I have code like this:

```
#include <cstdint>

void ff(int64_t* __restrict a, int64_t* __restrict b) {
    a[0] = b[0];
    a[1] = b[1];
    a[2] = b[2];
    a[3] = b[3];
    a[4] = b[4];
    a[5] = b[5];
    a[6] = b[6];
    a[7] = b[7];
    a[8] = b[8];
 a[9] = b[9];
    a[10] = b[10];
    a[11] = b[11];
    a[12] = b[12];
    a[13] = b[13];
    a[14] = b[14];
    a[15] = b[15];
}
```

Clang produces result of

```
ff(long*, long*):
        vmovups ymm0, ymmword ptr [rsi]
        vmovups ymmword ptr [rdi], ymm0
        vmovups ymm0, ymmword ptr [rsi + 32]
 vmovups ymmword ptr [rdi + 32], ymm0
        vmovups ymm0, ymmword ptr [rsi + 64]
        vmovups ymmword ptr [rdi + 64], ymm0
        vmovups ymm0, ymmword ptr [rsi + 96]
        vmovups ymmword ptr [rdi + 96], ymm0
        vzeroupper
        ret
```


But it could use other ymm registers to avoid code dependencies between iterations like gcc does:

```
ff(long*, long*):
        vmovdqu ymm2, YMMWORD PTR [rsi+32]
        vmovdqu ymm1, YMMWORD PTR [rsi+64]
        vmovdqu ymm0, YMMWORD PTR [rsi+96]
        vmovdqu ymm3, YMMWORD PTR [rsi]
 vmovdqu YMMWORD PTR [rdi+32], ymm2
        vmovdqu YMMWORD PTR [rdi], ymm3
        vmovdqu YMMWORD PTR [rdi+64], ymm1
        vmovdqu YMMWORD PTR [rdi+96], ymm0
        vzeroupper
 ret
```

Reproduce: https://godbolt.org/z/zfnsM3cKr


</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJykll9vszYUxj-NuTlqhY_5e8FF0yzaNFWbqknTrirAJ4k3wMw2eZV--glIW-icbtFbNSSE3-NjP358lNJadeiIChZvWLwNysEdtSl-LJ3T3WBUUGl5Ln6CY3kiqLUkaNRfBO6oLBMPLNyy8O2ahJf_-RaF6upmkARMPNbWSdU5Jn5YSk5aSdjvGWaqc0n04hg-wMuLIeuMqh2UDB_B_6himANLN_NAAAAlizchi7fAxBaqyw0Tnwi-JLiPwCWBPkIsCeEjoiUR-Yh4ScQ-IlkSiY9Il0TqI7Ilka2I8XG-fJx77Vo5yv2Wrj31mspXrnKvrXzlK_cay1fOcq-1fOUtX5nL0q0_rdP1sSm7A_RGy6EmC4bs0DjQ-y9iPqW30d2B4cOY1veP-fvxgMvfqdWnobdwbttwRM9t-00bCb0zwOKNsWqc6TXJCpUTOo8R3loFGG5A4Eex61UW6PcUS6IbVrZQfE_NPLm15qzw13wlo4e-J7P-3pD7Ik7zdTM4UA5qPTQSBkug3ZHMWAQMHZR1ZCw4DeXUDacWK6mnTlJXK7JQkftG1IFyZEqndGfnFnyoa5Ca_qMN35hP-fcwzgxH9I-np99_ed7Cr789v-UTN8vc_FvHr-quROCiC6_qrmzjRSeu6FbRHtnPjPxYy7zh6K_h0b1LxP-WvC1_1vEbdLdk8uswPtOlrzHxAEfn-ik4uGO4O2hZ6cbda3NguHsdX_vOPon6Z7McIZCFkLnIy4AKnvI84mmcYHAsqpxjGYWhCEkIrCnBFLMs3ss6ykNJaaAKDDEKE0x4HOeI9yFWZValmFFc8SypWBRSW6rmvmlO7TiPQFk7UJEnaZYFTVlRY6dfKIj12KMZjlsXmGLk76rhYFkUNso6-zGCU66hYm7p4zmBTjvQvVOteiVoqa378910kmxNXWmUhr020KueGtWp7hAMpik-OaXccajua90y3I2VLm93vdF_Uu0Y7qaJW4a7ee6nAv8JAAD__yQdiS8">