<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/96788>96788</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Clang does not optimize memcpy-like scenario for pipelining
</td>
</tr>
<tr>
<th>Labels</th>
<td>
clang
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
Hattonuri
</td>
</tr>
</table>
<pre>
I have code like this:
```
#include <cstdint>
void ff(int64_t* __restrict a, int64_t* __restrict b) {
a[0] = b[0];
a[1] = b[1];
a[2] = b[2];
a[3] = b[3];
a[4] = b[4];
a[5] = b[5];
a[6] = b[6];
a[7] = b[7];
a[8] = b[8];
a[9] = b[9];
a[10] = b[10];
a[11] = b[11];
a[12] = b[12];
a[13] = b[13];
a[14] = b[14];
a[15] = b[15];
}
```
Clang produces result of
```
ff(long*, long*):
vmovups ymm0, ymmword ptr [rsi]
vmovups ymmword ptr [rdi], ymm0
vmovups ymm0, ymmword ptr [rsi + 32]
vmovups ymmword ptr [rdi + 32], ymm0
vmovups ymm0, ymmword ptr [rsi + 64]
vmovups ymmword ptr [rdi + 64], ymm0
vmovups ymm0, ymmword ptr [rsi + 96]
vmovups ymmword ptr [rdi + 96], ymm0
vzeroupper
ret
```
But it could use other ymm registers to avoid code dependencies between iterations like gcc does:
```
ff(long*, long*):
vmovdqu ymm2, YMMWORD PTR [rsi+32]
vmovdqu ymm1, YMMWORD PTR [rsi+64]
vmovdqu ymm0, YMMWORD PTR [rsi+96]
vmovdqu ymm3, YMMWORD PTR [rsi]
vmovdqu YMMWORD PTR [rdi+32], ymm2
vmovdqu YMMWORD PTR [rdi], ymm3
vmovdqu YMMWORD PTR [rdi+64], ymm1
vmovdqu YMMWORD PTR [rdi+96], ymm0
vzeroupper
ret
```
Reproduce: https://godbolt.org/z/zfnsM3cKr
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJykll9vszYUxj-NuTlqhY_5e8FF0yzaNFWbqknTrirAJ4k3wMw2eZV--glIW-icbtFbNSSE3-NjP358lNJadeiIChZvWLwNysEdtSl-LJ3T3WBUUGl5Ln6CY3kiqLUkaNRfBO6oLBMPLNyy8O2ahJf_-RaF6upmkARMPNbWSdU5Jn5YSk5aSdjvGWaqc0n04hg-wMuLIeuMqh2UDB_B_6himANLN_NAAAAlizchi7fAxBaqyw0Tnwi-JLiPwCWBPkIsCeEjoiUR-Yh4ScQ-IlkSiY9Il0TqI7Ilka2I8XG-fJx77Vo5yv2Wrj31mspXrnKvrXzlK_cay1fOcq-1fOUtX5nL0q0_rdP1sSm7A_RGy6EmC4bs0DjQ-y9iPqW30d2B4cOY1veP-fvxgMvfqdWnobdwbttwRM9t-00bCb0zwOKNsWqc6TXJCpUTOo8R3loFGG5A4Eex61UW6PcUS6IbVrZQfE_NPLm15qzw13wlo4e-J7P-3pD7Ik7zdTM4UA5qPTQSBkug3ZHMWAQMHZR1ZCw4DeXUDacWK6mnTlJXK7JQkftG1IFyZEqndGfnFnyoa5Ca_qMN35hP-fcwzgxH9I-np99_ed7Cr789v-UTN8vc_FvHr-quROCiC6_qrmzjRSeu6FbRHtnPjPxYy7zh6K_h0b1LxP-WvC1_1vEbdLdk8uswPtOlrzHxAEfn-ik4uGO4O2hZ6cbda3NguHsdX_vOPon6Z7McIZCFkLnIy4AKnvI84mmcYHAsqpxjGYWhCEkIrCnBFLMs3ss6ykNJaaAKDDEKE0x4HOeI9yFWZValmFFc8SypWBRSW6rmvmlO7TiPQFk7UJEnaZYFTVlRY6dfKIj12KMZjlsXmGLk76rhYFkUNso6-zGCU66hYm7p4zmBTjvQvVOteiVoqa378910kmxNXWmUhr020KueGtWp7hAMpik-OaXccajua90y3I2VLm93vdF_Uu0Y7qaJW4a7ee6nAv8JAAD__yQdiS8">