<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/54535>54535</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            avoid libcall to memcpy harder
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            backend:X86,
            llvm:optimizations,
            missed-optimization
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          nickdesaulniers
      </td>
    </tr>
</table>

<pre>
    Via [this thread](https://lore.kernel.org/lkml/YjxTt3pFIcV3lt8I@zn.tnic/):

Consider the following example:
```
struct foo {
    unsigned long x0;
    unsigned long x1;
    unsigned long x2;
    unsigned long x3;
    unsigned long x4;
    unsigned long x5;
    unsigned long x6;
    unsigned long x7;
    unsigned long x8;
    unsigned long x9;
    unsigned long x10;
    unsigned long x11;
    unsigned long x12;
    unsigned long x13;
    unsigned long x14;
    unsigned long x15;
    // Comment out below members.
    unsigned long x16;
    unsigned long x17;
    unsigned long x18;
    unsigned long x19;
} *x, *y;

struct foo* get_x(void);

struct foo* cpy(struct foo *y) {
    struct foo *x = get_x();
    if (y != x)
        *x = *y;
    return x;
}
```
compiled with `-O2 -mno-sse` (as the Linux kernel does), we get:
```asm
cpy:
  ...
        movl    $160, %edx
        movq    %rbx, %rdi
        movq    %r14, %rsi
        callq   memcpy@PLT
...
```
but if we reduce the number of members in `struct foo`, we can get:
```
cpy:
  ...
        movl    $16, %ecx
        movq    %rax, %rdi
        movq    %rbx, %rsi
        rep;movsq (%rsi), %es:(%rdi)
...
```
which is going to be way faster.  FWICT, it looks like isel is choosing whether to lower `@llvm.memcpy.p0i8.p0i8.i64()` to a libcall to memcpy vs inline a simple memcpy.

I assume there's some limit on how many bytes rep;movsq can copy, but surely it's much larger than 16x8B?

cc @phoebewang 
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyVVlFvozgQ_jXkxQoCAyl54KFNFanSSnsP1d7d08nYk-CNwaxtmmR__Y2BNkm3tXYjBIFvPOP55hvbtRbn6ptkJCoeXCMtcY0BJqLiMaJl41xvo-w-olu8lDYQH8B0oGJt9v7LoVX4-Pf76dll_faJf8uUK5-iPPnZxa6TfBy49h6SxyiZ7xvdWSnAYCggO62UPspuT-DE2l7BxXiVzNf4ap0ZuEN7TaK7h-kbwd-AzvYdCKI0OjklUfY5mIZAGgKzEJiHwCIErkLgXQgsQ-A6SEKYoiBHaZCkNMhSGqQpveVpUhzZ6LaFzhE9OFIDCoW00NZgbPy5oyCnaZDUNMhqeqE1unvEKd6fIrrxz_MFeKdVBMke3H9oWb5oKcZuCNjy_oyW11r33un6VvO3-IlE2eNblOsI3lbu0KY84y31ZiePv4ET1bOHm0Q8YsANpsMhV2l_2Jtct71USNRRuobg5-VXSpZtp5fWAr76GTA79vsX2Q0nMq0iRGiwfj7I4hF8Br82P7PtHAOZeUUJieP4NotWv6gpmzzFgWNdChCnX6x-TFaFqefqFUbIz61QtbOVfWfFmVLeDBXp55Ynf315nizeJveOphpVjOXAVA2IgcNISDd4QRO9e5U2kZ2n8EoXUz44jLPuY5b-nKJXhniAIfZbDF14fM-QgR6lg7b2BxmVOZpM9faxp62lnAO86vIz9o6N5A3BHWqv_XbhNK4I5MjOZMesAxMTsv37afPsnUuHLasPlih5AByCUsNxvNHa-qHHBpB5413gkoJ_fIw8Ueqljadqxn0iy-kmV_ncVahjHMHQae1r718ma_Lii6ZkB4ha6XexGYmvO_2JMGuHdiy7gYjeWWI1virZ4oR1Rxq_vrHuTOqzA3tDn688135x2BAvIzsYUGdMdHTTDsiMYmY_bqpomq5O5UOUba_Dc04wyb7RUMORIQ0LUWVina3ZwkmnoGJ-gfogu4YZ3K0Xg1HV7YFgj90-1DF2vz8LIHvzY9kb_R04zm0rMWPf49siL7Ji0VS04DRnK05XYiXWNNsVVORZXnC2y1nO2EIxXOhthceRiNKa8QN0AiP-U6Jk6aQcOsbK7nXvkLqfzEk8UVzQFoOCWF6jHiweF7KiCaVJhg1Q0jRJYiHWaeLns8PprFY7JAhaJlU8agEPOQtTjRnVw956iUjr7AXEevoNAsbZon82uEabCg8-BwGWDaqT2NCLkYRqZOB_upm8VA">