<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/64224>64224</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Unnecessary member-wise copying of trivially copyable types
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
ruschaaf
</td>
</tr>
</table>
<pre>
I have a trivially copyable small struct type that contains various integer members. Under certain circumstances the optimizer will perform unnecessary member-wise assignment even at high optimization levels.
A simple example is below. A full example - with variants that *do* optimize correctly, is here: https://clang.godbolt.org/z/MEfzoa7zh
```cpp
// A structure that is 16 bytes long, composed of many member variables.
// It is trivially copyable, and could be treated as 2 uint64_ts when copied
struct Data {
uint32_t a;
uint8_t b, c, d, e;
uint32_t v;
uint8_t w, x, y, z;
};
Data A();
Data B();
// This is the function that doesn't optimize correctly
Data unnecessary_memberwise_copy(uint16_t op)
{
if (op == 0) {
return A();
}
else if (op == 1) {
return B();
}
return {};
}
```
With -O2, -O3, and -Os the resulting IR (and final assembly) does a large amount of masking and or-ing to assemble the result registers member by member.
```asm
unnecessary_memberwise_copy(unsigned short): # @unnecessary_memberwise_copy(unsigned short)
push r15
push r14
push rbx
cmp edi, 1
je .LBB0_4
test edi, edi
jne .LBB0_2
call A()@PLT
jmp .LBB0_5
.LBB0_4: # %if.then3
call B()@PLT
.LBB0_5: # %return
movabs rcx, -4294967296
mov r8, rax
and r8, rcx
movabs rsi, -1099511627776
mov r9, rax
and r9, rsi
movabs rdi, -281474976710656
mov r10, rax
and r10, rdi
movabs r14, -72057594037927936
mov rbx, rax
and rbx, r14
and rcx, rdx
and rsi, rdx
and rdi, rdx
mov r11, rdx
and r11, r14
jmp .LBB0_6
.LBB0_2:
xor edi, edi
xor esi, esi
xor ecx, ecx
xor edx, edx
xor ebx, ebx
xor r10d, r10d
xor r9d, r9d
xor r8d, r8d
xor eax, eax
xor r11d, r11d
.LBB0_6: # %return
movabs r14, 71776119061217280
and r10, r14
or r10, rbx
movabs rbx, 280375465082880
and r9, rbx
movabs r15, 1095216660480
and r8, r15
or r8, r9
or r8, r10
mov eax, eax
or rax, r8
and rdi, r14
and rsi, rbx
or rsi, rdi
and rcx, r15
or rcx, rsi
mov edx, edx
or rdx, rcx
or rdx, r11
pop rbx
pop r14
pop r15
ret
```
It seems that this behavior first appeared in clang 7.0 - prior to that the code was optimized correctly. GCC does not show this behavior.
>From what I can tell (as someone who hasn't dug into the optimization passes before) it looks like the brace initialization on the final line of the function causes the SROA pass to split all loads and stores of the result value into 10 member-wise loads or stores. And those member-wise operations may or may not get re-combined into 64-bit operations in later passes.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyUWF9vqzoS_zTkZZTIGALhIQ_p6XZV6a7O6u692sfKwCT4HmMj2yRNP_3KNpCEJF3d6gh6_PP8-82MGZcZww8ScRutX6L164L1tlF6q3tTNYztF6Wqz9t3aNgRgYHV_MiZEGeoVHdmpUAwLRMCjNV9ZcGeOwTbMAuVkpZxaeDINFe9AS4tHlBDi22J2qzgT1mjhgq12wcV11XfGstkhQZsg6A6y1v-hRpOXAjoUO-VbqGXEis0hunzoGt54gYhBNKitIBHlMAsNPzQjGqY5UqCwCMKs4rIa0R24bkDw9tOIOAn829uoEShTivYwb4XYgKWcOK28QExaU2IM6K7WkV0N7kLldIaKyvOEf3hlDWoMUp20FjbmSjZRfQtom-VYPKwOqi6VMKulD5E9O0rom__-sf-S7H8q7n2McpI-Fd13bDilcBuIL7XA-3cQJxBebZoQCh5cD5Uqu2UwRrUHlomR9pCIKVAs4Ibpe9ezX2qnS4ma6hUL2ooEaxGZrEGZoBCz6XN0g9r4NSgdEIc66B3KI5XZhlE-UtYBAAvk9APCyxKZsubDwul9949avfAu01e9nhZHgVPbvune_gkfE07ovz18rt_eqd2Ed1EtJggv_gyW7zh6I-GG89Sg7DvZeWry2egVmhkRHP7qCIu2q_K-CPkw1Xxh-M6ohsXSJx9OB3OhcH3q-j5HiK6UR1EyWuUvAKJaHHFrUbba3kXl5N0FEz_QWHwTld8q2v4GVTOWbnVN2xywtdMj1umOr5m9L-uqZY_qcvU8mcyVtnyZ2BXo-mF5fIA7787Px2255IJ1_HYlq7PCs86MBBMHxBYq3ppQ7mbX07UCSm9dL9aNQrilX7QeODGojZjd5Rjn6wediIzbVj5P4mU_nitwTRKW0_cyGhEE4hS8nflb5LS9abxvMfrAbispE-2lp-3QNV2_o01d9zHt-hfCKvfXl7Ix0ydRWOvpNzrVk6ifwdhOoCV-1YATIWZkn__9sdMcHAnCA5hjS4ku0AbXfP9yjYok1ksg_6Xe_2jvouKUKy3Clp1ZKUB0JU_PpYpLdIiy2mR3e0DvXFbNJvx6WrNEx3g6vOJBeOJW8akKNZxnNE8z0cjTrtXUXxvIcCGP7EQUrOkmzjN0yLP8phk6weBhHoh39sa8CnPk5U49VZyStb5ukhJkhc0L5JndsrPWzuTgQGY1-2Eh4zo-pmDgc8LPgE1fyh4iTz-XvGAzx27LdTsusqo-8zfbP5UGh62ywQE93GezAkP4eO8nC6KAz6PYsIDvTjv_hHXMalDmKS-36GLABYPsFDrAd88wZEF61PWL2bjwWxcXzOYXc7Jpz-P2nhWlXmc51kcFySLaZzTDfm-uKcUD95NwJy1yUxglW5Ikq_TbE02dPPIyNCpT9XEa3_0kmJN4yzLSPrU03CoTMf95GhYn30eZmhMHnfAXXbm4gHXmyc-DQ32qHPHrpxHPqkeu5Z_3_X3AQ_Ag8MvxPSkHyb5gF-O5zkQzz6EnQrtfhfKBEzxX1bWdzPUN4PQuwWD2A4XC9v4i0jDjlxp2HNtLLCuQ6axBndhchcIyFcEltBpt8eqUdKNnDXCiZlpCK0vU-gK_vnjR5iYpLJusjjdWpvuA_75plULJ6f4HSomwaIQfhAzYFSLSiKcGgUNG8beuj-4y566vsSF21fnBi9nZq80uqmNWxBK_TIg-K8wjZWaVQhccsuZGOX8aI3D1Ce4RDfa3UzeFevNcG38z-8_d96S48N0gltwQ4FQrDa-rIxVGs2oYhj_jkz0GNyOyc29MggqPcitYCdrsI0yeLNNdai9twZadnb73cvxe0A3XS4r1ZZc-txZBVm6LLm9luISBLOoB5aGqXNRb5O6SAq2wG2cFSTL8vUmWzRbpBVSkrC4qhLc5yVBuidJWVRJsc9Jtl7wLSU0ITktaLxOSLbKalaxfVWTDDdpnCdRSrBlXKyEOLbuBrrgxvS4zVJK04VgJQrj_ypAqcQTeDCiNFq_LvTWySzL_mCilAhurLlosdwK3P755Jru5lo3hjv67_-YYM8dmkWvxfb2tnzgtunLVaXaiL45S8Nr2Wn1F1Y2om_ePxPRN-___wIAAP__htzJqw">