<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/80388>80388</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
[AArch64][ISel] Better instruction select when load/store?
</td>
</tr>
<tr>
<th>Labels</th>
<td>
backend:AArch64,
llvm:codegen,
llvm:globalisel,
llvm:SelectionDAG
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
hstk30-hw
</td>
</tr>
</table>
<pre>
https://godbolt.org/z/4fGa3xd7o
```
#include <string.h>
#include<stdio.h>
#include <stdint.h>
typedef uint32_t u32;
void *copy(void *restrict dest, const void *restrict src, size_t n)
{
unsigned char *d = dest;
const unsigned char *s = src;
uint32_t w, x;
if ((uintptr_t)d % 4 == 0) {
for (; n>=16; s+=16, d+=16, n-=16) {
*(u32 *)(d+0) = *(u32 *)(s+0);
*(u32 *)(d+4) = *(u32 *)(s+4);
*(u32 *)(d+8) = *(u32 *)(s+8);
*(u32 *)(d+12) = *(u32 *)(s+12);
}
return dest;
}
return dest;
}
```
In this case, GCC load 8 + 8 bytes in loop, but Clang load 4 + 8 + 4 in loop.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyUVNGO2yoQ_Zrxy2gjMthe-8EPTny96nM_YGUDibllIQLc7fbrryDZNjdZdVUJESbnzJnD2J4pBH20SnVQ7aAaimmNi_PdEuI3zh6W12J28q1bYjwF4D3QCDQenZydiRvnj0DjT6CxPDxN_Id8dMAGYP1lr9llnUPi2gqzSoXA9yF6bY-bBfg_t3BGpXYfgnhBbbyC49tJSXXAVdvI6Tniygn47trMd6clAvXCnd6AmvfQq2RERJQqRKA9CmdDxDs4eJHQoH-q54gWqL2IP_6q0q42t1KiWCafsiUCH87Kv8205wp35JDJqc7_jSfh92u9Jg8_rsX0AYEaoCZxTtE_R6A2Wa-wTIJJkwG1eG0UWHtwPifyHdrURj5s6xQEoN05oD3K68A-XI53WmlRnzxwwnxqgZqUfK7MB_wADxf8-jZ_FCs_ESv_Rqz5RKz5G7EtfaKWCbdyj8N16FVcvb17Wa5ZH1J-4TcfW96_WIyLDiimoNIzfNrv0bhJYoNAO2xwfosqoLZonDslxrxG3JvJHs-88sJLe_nO2xSy47Ll7VSobvvI6qZijDfF0pW8mmVV1bXY1i3Jpp2rw7auK9q2lZjVXOiOGJWMGLGcszlwKeRhnraloJKxFkqmXiZtNsZ8f0kDptAhrKprGG-awkyzMiGPKqJ5Et-UlcD7vvdiqUsgAtoDUcoF3gsn1VHZ27-Pxs2T0UGZW-SrMkpE7ezQPyWsGgrfJexhXo8BSmZ0iOG3taijyYPz3UA1QLX78lUZqAbcqRiVR21D9GuWxZAL4OuibO4v0Bii8wr4WKze3I5ZHZd13gj3AjRmi-efh5N3_yoRgcbcnAA05v78FwAA___2KpC0">