<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - Speed degradation because of inlining a register clobbering function"
href="https://bugs.llvm.org/show_bug.cgi?id=43562">43562</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Speed degradation because of inlining a register clobbering function
</td>
</tr>
<tr>
<th>Product</th>
<td>clang
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>LLVM Codegen
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedclangbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>antoshkka@gmail.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org, neeilans@live.com, richard-llvm@metafoo.co.uk
</td>
</tr></table>
<p>
<div>
<pre>Consider the example that is a simplified version of
boost::container::small_vector:
#define MAKE_INLINING_BAD 1
struct vector {
int* data_;
int* capacity_;
int* size_;
void push_back(int v) {
if (capacity_ > size_) {
*size_ = v;
++size_;
} else {
reallocate_and_push(v);
}
}
void reallocate_and_push(int v)
#if MAKE_INLINING_BAD
{
// Just some code that clobbers many registers.
// You may skip reading it
const auto old_cap = capacity_ - data_;
const auto old_size = capacity_ - size_;
const auto new_cap = old_cap * 2 + 1;
auto new_data_1 = new int[new_cap];
auto new_data = new_data_1;
for (int* old_data = data_; old_data != size_; ++old_data, ++new_data)
{
*new_data = *old_data;
}
delete[] data_;
data_ = new_data_1;
size_ = new_data_1 + old_size;
capacity_ = new_data_1 + new_cap;
*size_ = v;
++size_;
}
#else
;
#endif
};
void bad_inlining(vector& v) {
v.push_back(42);
}
With `#define MAKE_INLINING_BAD 0` the generated code is quite good:
bad_inlining(vector&): # @bad_inlining(vector&)
mov rax, qword ptr [rdi + 16]
cmp qword ptr [rdi + 8], rax
jbe .LBB0_2
mov dword ptr [rax], 42
add rax, 4
mov qword ptr [rdi + 16], rax
ret
.LBB0_2:
mov esi, 42
jmp vector::reallocate_and_push(int) # TAILCALL
However, with `#define MAKE_INLINING_BAD 1` the compiler decides to inline the
`reallocate_and_push` function that clobbers many registers. So the compiler
stores the values of those registers on the stack before doing the cmompare and
ja:
bad_inlining(vector&): # @bad_inlining(vector&)
push rbp ; don't need those pushes for the `(capacity_ > size_)` case
push r15
push r14
push r13
push r12
push rbx
push rax
mov r14, rdi
mov r15, qword ptr [rdi + 8]
mov r13, qword ptr [rdi + 16]
mov rbp, r15
sub rbp, r13
ja .LBB0_14 ; hot path that does not clobbers registers
; vector::reallocate_and_push(int) implementation
add rsp, 8
pop rbx ; don't need those pops for the `(capacity_ > size_)` case
pop r12
pop r13
pop r14
pop r15
pop rbp
ret
This greatly degrades the performance of the first branch (more than x3
degradation in real code).
The possible fix would be to place all the push/pop operations near the inlined
`reallocate_and_push`:
bad_inlining(vector&):
mov rax, qword ptr [rdi + 16]
cmp qword ptr [rdi + 8], rax
jbe .LBB0_2
mov dword ptr [rax], 42
add rax, 4
mov qword ptr [rdi + 16], rax
ret
.LBB0_2:
push rbp
push r15
push r14
; ...
; vector::reallocate_and_push(int) implementation goes here
; ...
pop r14
pop r15
pop rbp
ret
Godbolt playground: <a href="https://godbolt.org/z/zM9bR0">https://godbolt.org/z/zM9bR0</a>
Related GCC issue: <a href="https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91981">https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91981</a></pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>