<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Issue</th>
<td>
<a href=https://github.com/llvm/llvm-project/issues/150120>150120</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>
Missed optimization: Lowering struct materialization into cold branches
</td>
</tr>
<tr>
<th>Labels</th>
<td>
new issue
</td>
</tr>
<tr>
<th>Assignees</th>
<td>
</td>
</tr>
<tr>
<th>Reporter</th>
<td>
jeremy-rifkin
</td>
</tr>
</table>
<pre>
Clang optimizes sub-optimally for the following code:
```cpp
struct S {
int& x;
int& y;
bool check() {
return x < y;
}
};
[[noreturn]] [[gnu::cold]] void bar(const S& s);
void foo(int a, int b) {
S s{a, b};
if(s.check()) [[unlikely]] { // very unlikely and cold
bar(s);
}
}
```
```asm
foo(int, int):
sub rsp, 24
mov dword ptr [rsp + 4], edi
mov dword ptr [rsp], esi
lea rax, [rsp + 4]
mov qword ptr [rsp + 8], rax
mov rax, rsp
mov qword ptr [rsp + 16], rax
cmp edi, esi
jl .LBB0_2
add rsp, 24
ret
.LBB0_2:
lea rdi, [rsp + 8]
call bar(S const&)@PLT
```
Struct `S` must be on the stack in order to call `bar()`, however, that's only needed in the unlikely case that the condition fails. Ideally the codegen should be the following:
```asm
foo(int, int):
cmp edi, esi
jl .LBB0_2
ret
.LBB0_2:
sub rsp, 24
... copy edi/esi to the stack and make struct S ...
call bar(S const&)@PLT
```
MSVC generates something along these lines, gcc and clang do not: https://godbolt.org/z/4axKfoe8x
I can't simply write `if(a < b)` or delay the construction of `S` until inside the branch. My specific use case that results in code like this is [libassert](https://github.com/jeremy-rifkin/libassert), where an expression template is built from the user's condition and that is evaluated and inspected during assertion failure.
Even if the code is written as follows, clang still generates sub-ideal code:
```cpp
void foo(int a, int b) {
if(a < b) [[unlikely]] { // very unlikely and cold
S s{a, b};
bar(s);
}
}
```
```asm
foo(int, int):
sub rsp, 24
mov dword ptr [rsp + 4], edi
mov dword ptr [rsp], esi
cmp edi, esi
jl .LBB0_2
add rsp, 24
ret
.LBB0_2:
lea rax, [rsp + 4]
mov qword ptr [rsp + 8], rax
mov rax, rsp
mov qword ptr [rsp + 16], rax
lea rdi, [rsp + 8]
call bar(S const&)@PLT
```
This may be a tricky optimization to perform, however, due to the above I expect it would benefit a large amount of code.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJy0V1tv4zYT_TX0yyCCTF_z4Ac7-Qwsvl2ggBd9LShxZHFDkSqHsuP99cVQdqI42W3abAMDiniZOXM7x1ZEZu8QV2K2EbP7kepi7cPqGwZsTjfBVA_GjQqvT6s7q9wefBtNY74jAXXFTXpT1p6g8gFijVB5a_3RuD2UXqOYrEWePvO8_5RtK_I1xdCVEXYgFhuRrwEAjItCzuFRTK5WTs8rhfcWyhrLByGXQt5ergeMXXDwCGJyNzwvFvfse3HfL_GHo9w4398Qs3sxu4d-ce86hjtZl97q887BGw2FCkIuS-8owo4RkZC3TxbTkcp7IZfGRVBC3jFyKAb4AGAHJBabtFs8AUpRVkIuKRtElS4mSJ2z5gHt6YJzsQEht0Ju4YDhBJdtUE5DQt2bTKlKoJ-RDnIxqMZVcRQ1Il8_BXMOJdlYD2xTV6RnoJaPyOlgr_EH0EcfNLQxcBiBWhByA1MOQt4BanN1nP-ur1wO0_CwRQVBPfLGteHXFv98C8TybJetvLxytsuu32lrPH_TWNm06clxvorgm-2f2efNJv9DDnaU1q9zGjCKfH05_KIGnIt0vndzHeMAj7L2uSF2kPpYyDlXdZr_9vnr637Y9dMp5vlOzHNoOopQIHiXJpyiKh_AOPBBY4Doexdinvcu2PA8Z1C1P-IBA_8baxWFXBB4Z0_gEDVqtsEGn9q4VITpZFouvdMmGu-gUsZSBp80JqrpNzXu0QHVvrOa0b0gn9e883et_S_L9laFfjYeWZZB6dtT72eLZDiBz2nlSW7UA7-eGTLLso9V88vu9zvYo8OgItO2bzDWTNDKerdn34RgjUNitPuy7Okksb324HwUkzXUMbbEASb-2XtdeBszH_ZCbr8LuZ2qx_9XHpePvdNPUCon5CICmaa1JzgGE5GbJPGdSkxd9K0CPoBGqy6VdX3sXHlfPXVh56KxYBwZ3Re7CMqVdQZfTkAtlqYyJXSEgy4KSJ2NxH3G_QLcZhBrQ2CIJ8aaQhFhiGmQl1chmlh3RVb6RsjtCzkUcvt8k-n6Do41BgTlAB_bgESMPWLTWhWRnRWdsRGq4Ju-44mnYkGDHuecJ9CGAA_KdiqiTqvGcXj8pruQ6pY8XwajC5j1Of_fAR2Y6mk-2BSnPaIDRefhSEXui0vRWDvsjK64MTxjP5Xud-rdVZk_Lmg_FNA31e6F-P8SwfuR2L2lXr9K8D6gJW_BfY-cvEde_6G0Dux-XF7_S937ytTQqBPLiYIYTPlwunzhVWngoocWQ-VDc6VvusMLkavCHxA-MRVgGcFEOJ41ymFlIiiwKuwRVOM7F5nieNyykV5N9O3kVo1wNV7MJlKO8_F0VK9mOL5dTpa5WlTzqRwrqVDfVnMpUd_Ox3M9MiuZy1m-kFKOZ_l4mi2WWo_HxXQynVdqKWdimmOjjM2sPTTM2SND1OGKT8t8ZFWBltIPACkdHiHtCin590BY8aWbotuTmObWUKRnM9FEi6svhgj1i0SxYnz2R0x8dZayRkUMRtlLLo3jrw6eM5OYHGnUBbv6CQuz3_Pjpg3-G5ZRyG1CS0Juz-EcVvKvAAAA___WD7Zv">