<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/64224>64224</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Unnecessary member-wise copying of trivially copyable types
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          ruschaaf
      </td>
    </tr>
</table>

<pre>
    I have a trivially copyable small struct type that contains various integer members. Under certain circumstances the optimizer will perform unnecessary member-wise assignment even at high optimization levels.

A simple example is below. A full example - with variants that *do* optimize correctly, is here: https://clang.godbolt.org/z/MEfzoa7zh

```cpp
// A structure that is 16 bytes long, composed of many member variables. 
// It is trivially copyable, and could be treated as 2 uint64_ts when copied
struct Data {
    uint32_t a;
    uint8_t b, c, d, e;
    uint32_t v;
 uint8_t w, x, y, z;
};

Data A();
Data B();

// This is the function that doesn't optimize correctly
Data unnecessary_memberwise_copy(uint16_t op)
{
    if (op == 0) {
 return A();
    }
    else if (op == 1) {
        return B();
 }
    return {};
}
```

With -O2, -O3, and -Os the resulting IR (and final assembly) does a large amount of masking and or-ing to assemble the result registers member by member.

```asm
unnecessary_memberwise_copy(unsigned short):       # @unnecessary_memberwise_copy(unsigned short)
        push    r15
 push    r14
        push    rbx
        cmp     edi, 1
        je .LBB0_4
        test    edi, edi
        jne     .LBB0_2
 call    A()@PLT
        jmp     .LBB0_5
.LBB0_4: # %if.then3
        call    B()@PLT
.LBB0_5: # %return
        movabs  rcx, -4294967296
        mov r8, rax
        and     r8, rcx
        movabs  rsi, -1099511627776
 mov     r9, rax
        and     r9, rsi
        movabs  rdi, -281474976710656
        mov     r10, rax
        and     r10, rdi
 movabs  r14, -72057594037927936
        mov     rbx, rax
 and     rbx, r14
        and     rcx, rdx
        and     rsi, rdx
 and     rdi, rdx
        mov     r11, rdx
        and     r11, r14
        jmp     .LBB0_6
.LBB0_2:
        xor     edi, edi
 xor     esi, esi
        xor     ecx, ecx
        xor     edx, edx
        xor     ebx, ebx
        xor     r10d, r10d
        xor r9d, r9d
        xor     r8d, r8d
        xor     eax, eax
 xor     r11d, r11d
.LBB0_6:                                # %return
 movabs  r14, 71776119061217280
        and     r10, r14
 or      r10, rbx
        movabs  rbx, 280375465082880
        and r9, rbx
        movabs  r15, 1095216660480
        and     r8, r15
 or      r8, r9
        or      r8, r10
        mov     eax, eax
        or      rax, r8
        and     rdi, r14
        and rsi, rbx
        or      rsi, rdi
        and     rcx, r15
 or      rcx, rsi
        mov     edx, edx
        or      rdx, rcx
 or      rdx, r11
        pop     rbx
        pop     r14
 pop     r15
        ret
```

It seems that this behavior first appeared in clang 7.0 - prior to that the code was optimized correctly. GCC does not show this behavior. 

>From what I can tell (as someone who hasn't dug into the optimization passes before) it looks like the brace initialization on the final line of the function causes the SROA pass to split all loads and stores of the result value into 10 member-wise loads or stores. And those member-wise operations may or may not get re-combined into 64-bit operations in later passes.

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyUWF9vqzoS_zTkZZTIGALhIQ_p6XZV6a7O6u692sfKwCT4HmMj2yRNP_3KNpCEJF3d6gh6_PP8-82MGZcZww8ScRutX6L164L1tlF6q3tTNYztF6Wqz9t3aNgRgYHV_MiZEGeoVHdmpUAwLRMCjNV9ZcGeOwTbMAuVkpZxaeDINFe9AS4tHlBDi22J2qzgT1mjhgq12wcV11XfGstkhQZsg6A6y1v-hRpOXAjoUO-VbqGXEis0hunzoGt54gYhBNKitIBHlMAsNPzQjGqY5UqCwCMKs4rIa0R24bkDw9tOIOAn829uoEShTivYwb4XYgKWcOK28QExaU2IM6K7WkV0N7kLldIaKyvOEf3hlDWoMUp20FjbmSjZRfQtom-VYPKwOqi6VMKulD5E9O0rom__-sf-S7H8q7n2McpI-Fd13bDilcBuIL7XA-3cQJxBebZoQCh5cD5Uqu2UwRrUHlomR9pCIKVAs4Ibpe9ezX2qnS4ma6hUL2ooEaxGZrEGZoBCz6XN0g9r4NSgdEIc66B3KI5XZhlE-UtYBAAvk9APCyxKZsubDwul9949avfAu01e9nhZHgVPbvune_gkfE07ovz18rt_eqd2Ed1EtJggv_gyW7zh6I-GG89Sg7DvZeWry2egVmhkRHP7qCIu2q_K-CPkw1Xxh-M6ohsXSJx9OB3OhcH3q-j5HiK6UR1EyWuUvAKJaHHFrUbba3kXl5N0FEz_QWHwTld8q2v4GVTOWbnVN2xywtdMj1umOr5m9L-uqZY_qcvU8mcyVtnyZ2BXo-mF5fIA7787Px2255IJ1_HYlq7PCs86MBBMHxBYq3ppQ7mbX07UCSm9dL9aNQrilX7QeODGojZjd5Rjn6wediIzbVj5P4mU_nitwTRKW0_cyGhEE4hS8nflb5LS9abxvMfrAbispE-2lp-3QNV2_o01d9zHt-hfCKvfXl7Ix0ydRWOvpNzrVk6ifwdhOoCV-1YATIWZkn__9sdMcHAnCA5hjS4ku0AbXfP9yjYok1ksg_6Xe_2jvouKUKy3Clp1ZKUB0JU_PpYpLdIiy2mR3e0DvXFbNJvx6WrNEx3g6vOJBeOJW8akKNZxnNE8z0cjTrtXUXxvIcCGP7EQUrOkmzjN0yLP8phk6weBhHoh39sa8CnPk5U49VZyStb5ukhJkhc0L5JndsrPWzuTgQGY1-2Eh4zo-pmDgc8LPgE1fyh4iTz-XvGAzx27LdTsusqo-8zfbP5UGh62ywQE93GezAkP4eO8nC6KAz6PYsIDvTjv_hHXMalDmKS-36GLABYPsFDrAd88wZEF61PWL2bjwWxcXzOYXc7Jpz-P2nhWlXmc51kcFySLaZzTDfm-uKcUD95NwJy1yUxglW5Ikq_TbE02dPPIyNCpT9XEa3_0kmJN4yzLSPrU03CoTMf95GhYn30eZmhMHnfAXXbm4gHXmyc-DQ32qHPHrpxHPqkeu5Z_3_X3AQ_Ag8MvxPSkHyb5gF-O5zkQzz6EnQrtfhfKBEzxX1bWdzPUN4PQuwWD2A4XC9v4i0jDjlxp2HNtLLCuQ6axBndhchcIyFcEltBpt8eqUdKNnDXCiZlpCK0vU-gK_vnjR5iYpLJusjjdWpvuA_75plULJ6f4HSomwaIQfhAzYFSLSiKcGgUNG8beuj-4y566vsSF21fnBi9nZq80uqmNWxBK_TIg-K8wjZWaVQhccsuZGOX8aI3D1Ce4RDfa3UzeFevNcG38z-8_d96S48N0gltwQ4FQrDa-rIxVGs2oYhj_jkz0GNyOyc29MggqPcitYCdrsI0yeLNNdai9twZadnb73cvxe0A3XS4r1ZZc-txZBVm6LLm9luISBLOoB5aGqXNRb5O6SAq2wG2cFSTL8vUmWzRbpBVSkrC4qhLc5yVBuidJWVRJsc9Jtl7wLSU0ITktaLxOSLbKalaxfVWTDDdpnCdRSrBlXKyEOLbuBrrgxvS4zVJK04VgJQrj_ypAqcQTeDCiNFq_LvTWySzL_mCilAhurLlosdwK3P755Jru5lo3hjv67_-YYM8dmkWvxfb2tnzgtunLVaXaiL45S8Nr2Wn1F1Y2om_ePxPRN-___wIAAP__htzJqw">