<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - Missed constant propagation of aggregate return values of non-inlined static functions"
href="https://bugs.llvm.org/show_bug.cgi?id=34839">34839</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Missed constant propagation of aggregate return values of non-inlined static functions
</td>
</tr>
<tr>
<th>Product</th>
<td>new-bugs
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Keywords</th>
<td>performance
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>new bugs
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>peter@cordes.ca
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>clang is able to propagate constant return values from non-inlined functions in
simple cases (where it's a scalar int), but not in more complex cases (a
std::pair or std::optional)
// simple case:
int globalvar;
static // with static, clang omits the mov $42,%eax
__attribute__((noinline))
int get_constant() {
globalvar = 22;
return 42; }
# mov $42,%eax # omitted with static
movl $22, globalvar(%rip)
retq
int call_constant() { return 10 - get_constant(); }
pushq %rax
callq get_constant()
movl $-32, %eax # whether or not we optimize out mov $42,%eax in the
callee
popq %rcx
retq
(related: gcc doesn't even do the simple case:
<a href="https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82432">https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82432</a>)
But a more complex case defeats us:
#include <optional>
using std_optional_int = std::optional<int>;
static
__attribute__((noinline))
auto get_std_optional_int() noexcept -> std_optional_int {
return {42};
}
movabsq $4294967338, %rax # imm = 0x10000002A
ret
int a, b;
int main() {
a = get_std_optional_int().value();
// b = get_pair().second;
}
// clang6.0 -std=c++17 -stdlib=libc++ -Ofast -march=skylake
// <a href="https://godbolt.org/g/jkn741">https://godbolt.org/g/jkn741</a>
main: # @main
pushq %rax
callq get_std_optional_int()
movabsq $1095216660480, %rcx # imm = 0xFF00000000
testq %rcx, %rax
je .LBB2_2
movl %eax, a(%rip)
xorl %eax, %eax
popq %rcx
retq
.LBB2_2:
callq abort
Obviously we'd like to omit the branch, and maybe just movl $42, a(%rip),
although we might as well use the value in %eax if the function's going to put
it there. (As long as we know it's not the result of a long dependency
chain...)
BTW, if we do branch: bt $32, %rax / jnc would be much smaller code size, and
avoiding a 64-bit constant is good for the uop cache. This would be a win on
everything: K10/bdver/Ryzen, Jaguar/Atom/Silvermont, P6 and 64-bit-capable-P4,
and Sandybridge-family. A couple of those are slow for bt reg,reg, but not bt
reg,imm. It doesn't macro-fuse, though.
There's more to be gained by customizing the return-value passing convention
more for static functions (or LTO), e.g. passing the bool member in a separate
register even for 32-bit T where x86-64 SysV says to pack it into RAX.
(Working on a separate bug report about that, and maybe having std::optional
store its members in the opposite order)
related: libstdc++ std::optional<int> is not trivially copyable, but libc++'s
is. <a href="https://stackoverflow.com/q/46544019/224132">https://stackoverflow.com/q/46544019/224132</a>. Another optimization would
be to return in registers even when the C++ ABI would normally return in
memory.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>