<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - Clang could merge multiple variable copies into one"
href="https://bugs.llvm.org/show_bug.cgi?id=50320">50320</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Clang could merge multiple variable copies into one
</td>
</tr>
<tr>
<th>Product</th>
<td>new-bugs
</td>
</tr>
<tr>
<th>Version</th>
<td>12.0
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>new bugs
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>stpasha@gmail.com
</td>
</tr>
<tr>
<th>CC</th>
<td>htmldeveloper@gmail.com, llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>In the following example code
```
struct X {
bool a, b, c, d;
int e;
X(const X& x) : a(x.a), b(x.b), c(x.c), d(x.d), e(x.e) {}
};
```
the copy-constructor copies multiple primitive variables from one location in
memory to another. In this case, instead of issuing multiple copy commands, it
will be more efficient to copy all of them at once as a single 64-bit word.
However, this is the assembly produced by Clang (with `-O3` flag):
```
X::X(X const&) [base object constructor]:
mov al, byte ptr [rsi]
mov byte ptr [rdi], al
mov al, byte ptr [rsi + 1]
mov byte ptr [rdi + 1], al
mov al, byte ptr [rsi + 2]
mov byte ptr [rdi + 2], al
mov al, byte ptr [rsi + 3]
mov byte ptr [rdi + 3], al
mov eax, dword ptr [rsi + 4]
mov dword ptr [rdi + 4], eax
ret
```
For comparison, this is the assembly produced by GCC for the same code:
```
X::X(X const&) [base object constructor]:
mov rax, QWORD PTR [rsi]
mov QWORD PTR [rdi], rax
ret
```
The code can be viewed here: <a href="https://godbolt.org/z/cs15vr3sn">https://godbolt.org/z/cs15vr3sn</a>
Interestingly, if the struct is not tightly packed (for example, one of the
booleans flags is missing), then both Clang and GCC produce sub-optimal code
involving multiple copies -- even though the optimizer could realize that
copying one byte of "ghost" space could improve the performance. In fact, it
doesn't even have to be "ghost": if the copy constructor skips copying or
otherwise initializing one of the boolean flags, then the optimizer could still
recognize that adding that extra copy will be beneficial both from code size
and speed perspectives.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>