<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/87440>87440</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            small non-fixed-size bytewise copy is transformed to much slower `memcpy`
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          iximeow
      </td>
    </tr>
</table>

<pre>
    a bytewise copy of small but non-constant size with non-aliasing src/dest is transformed by is transformed LoopIdiomRecognize into an intrinsic memcpy. because the size is non-constant, neither InstCombine nor SelectionDAG transform the small copy back into an appropriate series of loads and stores, typically the intrinsic ends up as a call to `memcpy`. for small copies (<8 bytes as a fairly unscientific threshold) the library call is much slower than doing the copy with a short loop or inlined instructions. for size-optimized code, at least for x86 targets, a library call is also just larger.

i noticed this in some Rust ([godbolt](https://rust.godbolt.org/z/8Tf3q1j5G)) but it's pretty apparent with `restrict` arguments in C as well ([clang godbolt](https://clang.godbolt.org/z/eMc36Pfvd)).

it seems like handling dynamic-but-small-sized memcpy is just particularly tricky, so maybe there's not much we can do here. i didn't see an existing issue similar to this, at least...

---

i'm not very familiar with how symbolic information is retained in LLVM. it seems that ideally i could write `if (Size.isNotConstantButSmallerThan(16))` and decide to insert something better than a memcpy library call, but i can't tell if the max trip count of the original loop is retained as a hint on the memcpy size later, or if it's totally lost by virtue of being non-constant.

even then, in some target-specific cases there are specific instruction sequences that are more profitable than a `memcpy` - x86 FSRM (already handled in [x86 SelectionDAG](https://github.com/llvm/llvm-project/blob/82c6eeed08b1c8267f6e92d594c910fe57a9775e/llvm/lib/Target/X86/X86SelectionDAGInfo.cpp#L279-L288)) is the example i know. so i'm not sure that it is _always_ profitable to inline a small-but-dynamic-size memcpy?

i also couldn't figure out if there's a non-constant SDValue might still have range information associated to try anything in X86SelectionDAGInfo.cpp. did i miss a detail, or is SelectionDAG too late in the process to have range information? maybe an appropriate thing here would be a flag on memcpy to hint later that we knew a memcpy's max size is "small"? (and in _that_ case, is "dynamic but low-upper-bound" something LLVM could determine in LoopIdiomRecognize when creating the memcpy in the first place?)

i was hoping to put together a patch to propose too, but as-is i have no idea what an appropriate change would be ðŸ˜… hopefully someone has a better idea?
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyUVs2O2zgSfhr6UrBhy_8HHyZpeBCgs1ikg8HeghJVsipNkRqy1G710y-KsrvtzMwCe7EBiWJVfX8kpsQnT3Qw609m_TDBXpoQD_zKLYXzpAzVcEAoB6EzJwIbugFCDalF56DsBXzwUxt8EvQCid8IzixNfoyOMbE_QYrWFMeKkgAnkIg-1SG2VEE5_PrkMYTuS8Wh_UY2nLxuyF4CoNf_yD6xhZZa2w0zKMlinwikobE2p7uGTPEZPLE0FOGLT_I5tCV7Ah8iPJEjKxz8w2-_f3QwbpWny7OWaJ_fG8Cui6GLjEKQKDIlxcIFrBKgryBJiJS0qAwdW3RuyPt9NE6-StB3gAkQdAFIALOZjwOZzXwGdYgfDWgJU-zM8vMuk5DGL2vk6AbofbJMXrhmC9JESk1wlSn2uarjMmIcxjKcoO1tA8mFM0WQBj1UQcnRpXnUzBtCakIUcCF0ECKwd-ypAvZJYp_xSpce-Y2moRNu-Y0qsKEinRwFHGGSvOZ1twHBeCLJqOBfekKXAvzsk4DTZXFm5g9m_tv4y-CDsKUKpOEE7CGFluCbLldQ1p9OoSqDE7N-MMWuEemSWf5miqMpjrFPMru8n4V4MsXxzRTH3fd6-efi5_p3U-wVKJUwiym2CbpIIoOSjJG8jHiYzTxSkshWzGYOGE99S15yN5-VjDM5d-nGOvQn-B895QV_0xR9tcvNv-uXamzqHgSBRNQmcPxM0KCvnLJWDR5bttOyl2lWyzRlGkYhKbQZ1Q6jsO0dqlx0iudBiUgBWhzKbJxIeXofZFTImcBmcYC-mwFDxZU3xTY3ojagV06iTXBKvRqvZYdRlaw83YpgNrubZTqd3o1mim2bC79QHKDGlh1jHIFvwhnS0JbBsQX26k1U9elokQQvqoTHxz--zuAdJWlQgCvK3mOwoXcVnCMLKZVcK1VP_EYzTv8K8vmSE596eVIQKX5v0Jtit9iMVGTOfQUVWa5IR2SfKEpWojQKQkkiV0fhFf9bnSsgWWaKa8ZRVDNcZ-u1-KrEdNqqF80TfRoin9ijG314O3L2f8O61I8bjBVz_DkUilpPnVtfhS1BMhwuJNHEfeEoPWmpknSC28S844teKNfwuuXVfqOfp6kjm3PHYqI0CgkwEry_uIkMSPRnT97ShSBd14ZI0MVQs2Dp6ArgbRjCNCfI8enbV-UNXSSshtEEI_tm_UlX3Gb53xnvxNL05cyG1hRH516uf9Muhp9kxRTH0oVS86GwGyKq5rtyYXfFZltvaF9U6_3K7hfzmtZb3G-3a7rZh_W77xkVUxz_s9uMv7c9ffF1mNmuM8Xysdjup4_FbncJIM7YAb1i2zkChmcfzjN16Ic9Uh_poux8gv5Ad8Yh_biDL1yyWiM8B4JGwzUmsjouuC6P9yGbMzgbZVRnzSetF1Sy9U1C4P1Z__TwB7qeoOVTI5CEnYMGXwgi-hPdWRZTClZPzSqHRBwA_TC6hz38A1YzzR1gaDlp7UoN4K7aTr8c3yFk7et2imYXg6Wkyv-HlszyeEnAX471sass5nPODl0CtcOT-u3iNd1WHZjtNhJzJnj2dH5PgIyYevt6LTFFkWkxRaHFVc4-S_iHfv8j2yjbLC-98JZzw4XztO86itMy9L4yRXGTPhp_l5SrSCi2qgCNxb9eo84NebCRUK7H_vWwGEGrOeqJ4dCSaqTY38vkjAma0OVvA3S9gIQT5asVQodim_w8hi7ojSyEa-5hmurpPTLhQ05nOOcYuMfeNpmmd9zNcW72R7Pfmd1aS1Pda4zp8MHrWai6uMSvbmqWx0l1WFb75R4ndFhsF8Vivt0uVpPmYNe0tQUt11hWczu3uFytbb1ZbMol1Yg04UMxL1bz1Xw5XxS75Wq22i_LPa0ru1ruNpvamtWcWmQ3U9_r4T3Jx99ht12t5hOHJbmUb9FFoULIL5Xs9cMkHnLYlP0pmdXccZL0sYuwODqMNz41WM2vVI2Gvb92_3JRlnB3o7vNzUkf3eH_TsDccdIM1In-GwAA__88viop">