<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - Redundant load opt. or CSE pessimizes code (x86_64-linux-gnu)"
href="https://bugs.llvm.org/show_bug.cgi?id=40268">40268</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Redundant load opt. or CSE pessimizes code (x86_64-linux-gnu)
</td>
</tr>
<tr>
<th>Product</th>
<td>clang
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>LLVM Codegen
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedclangbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>bisqwit@iki.fi
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org, neeilans@live.com, richard-llvm@metafoo.co.uk
</td>
</tr></table>
<p>
<div>
<pre>For this code (-xc -std=c99 or -xc++ -std=c++17):
struct guu { int a; int b; float c; char d; };
extern void test(struct guu);
void caller()
{
test( (struct guu){.a = 3, .b = 5, .c = 7, .d = 9} );
test( (struct guu){.a = 3, .b = 5, .c = 7, .d = 9} );
}
CSE (or some other form of redundant loads optimization) pessimizes the code.
Problem occurs on optimization levels -O1 and higher, including -Os.
If the function "caller" calls test() just once, the resulting code is (-O3
-fno-optimize-sibling-calls, stack aligning insns omitted for brevity):
movabs rdi, 21474836483
movabs rsi, 39743127552
call test
If "caller" calls test() twice, the code is quite a bit longer and not just
twice as long. (Stack aligning insns omitted for brevity):
push r14
push rbx
movabs rbx, 21474836483
movabs r14, 39743127552
mov rdi, rbx
mov rsi, r14
call test
mov rdi, rbx
mov rsi, r14
call test
pop rbx
pop r14
If we change caller() such that the parameters in the two calls are not
identical:
void caller()
{
test( (struct guu){.a = 3, .b = 5, .c = 7, .d = 9} );
test( (struct guu){.a = 3, .b = 6, .c = 7, .d = 10} );
}
The generated code is optimal again as expected:
movabs rdi, 21474836483
movabs rsi, 39743127552
call test
movabs rdi, 25769803779
movabs rsi, 44038094848
call test
The problem in the first example is that the compiler sees that the same
parameter is used twice, and it tries to save it in a callee-saves register, in
order to reuse the same values on the second call. However re-initializing the
registers from scratch would have been more efficient.
Admittedly Clang, unlike GCC, does try to mitigate this problem. If I change
the code so that only 32-bits loads are needed, Clang no longer attempts to
backup the values.
void caller()
{
test( (struct guu){.a = 3, .b = 0, .c = 7, .d = 0} );
test( (struct guu){.a = 3, .b = 0, .c = 7, .d = 0} );
}
The above code generates, as expected:
mov edi, 3
mov esi, 1088421888
call test
mov edi, 3
mov esi, 1088421888
call test
The problem seems to be that the heuristics in the case of 64-bit loads fails
to account for the extra code required to spill and restore the callee-saves
registers.
The problem occurs on Clang versions 3.5 and newer, but not in versions 3.4 or
older.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>