<html>
<head>
<base href="https://llvm.org/bugs/" />
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW " title="NEW --- - Next gen non-allocating constexpr-folding future-promise does not optimise well on clang" href="https://urldefense.proofpoint.com/v2/url?u=https-3A__llvm.org_bugs_show-5Fbug.cgi-3Fid-3D23652&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=J0W9s3zM2PkKMxPPGy_B_oStmCJHxu5rzolkd1I_pR8&s=Cipbf3s3bPcUr2HEJ-dkyQk0UPNbRjJZfXmRIbQT1Cw&e=">23652</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Next gen non-allocating constexpr-folding future-promise does not optimise well on clang
</td>
</tr>
<tr>
<th>Product</th>
<td>clang
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>C++14
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedclangbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>s_bugzilla@nedprod.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvmbugs@cs.uiuc.edu
</td>
</tr>
<tr>
<th>Classification</th>
<td>Unclassified
</td>
</tr></table>
<p>
<div>
<pre>As part of working on next generation non-allocating constexpr-folding
future-promises for the Boost.Thread rewrite (and with the hope these become
the next STL future-promises), clang currently does not perform ideally as
compared to GCC.
I have spoken with Chandler Carruth about these at C++ Now, and he may chime in
here about the importance of clang matching GCC in performance with these. I am
also raising these with colleagues on the MSVC team, as poor old VS2015
generates about 3000 opcodes for the last example :(.
Anyway as a quick summary, under these next-gen future-promises this sequence:
extern BOOST_SPINLOCK_NOINLINE int test1()
{
using namespace boost::spinlock::lightweight_futures;
monad<int, true> m(5);
return m.get();
}
... should turn into:
0000000000000000 <_Z5test1v>:
0: b8 05 00 00 00 mov $0x5,%eax
5: c3 retq
... and indeed does under GCC, but under clang 3.6 and 3.7 turns into:
0000000000000000 <_Z5test1v>:
0: 53 push %rbx
1: 48 83 ec 20 sub $0x20,%rsp
5: c7 44 24 08 05 00 00 movl $0x5,0x8(%rsp)
c: 00
d: c7 44 24 18 01 00 00 movl $0x1,0x18(%rsp)
14: 00
15: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi
1a: e8 00 00 00 00 callq 1f <_Z5test1v+0x1f>
1f: 89 c3 mov %eax,%ebx
21: 8b 44 24 18 mov 0x18(%rsp),%eax
25: ff c8 dec %eax
27: 83 f8 03 cmp $0x3,%eax
2a: 77 24 ja 50 <_Z5test1v+0x50>
2c: ff 24 c5 00 00 00 00 jmpq *0x0(,%rax,8)
33: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi
38: e8 00 00 00 00 callq 3d <_Z5test1v+0x3d>
3d: eb 09 jmp 48 <_Z5test1v+0x48>
3f: 48 c7 44 24 08 00 00 movq $0x0,0x8(%rsp)
46: 00 00
48: c7 44 24 18 00 00 00 movl $0x0,0x18(%rsp)
4f: 00
50: 89 d8 mov %ebx,%eax
52: 48 83 c4 20 add $0x20,%rsp
56: 5b pop %rbx
57: c3 retq
58: 48 89 c3 mov %rax,%rbx
5b: 8b 44 24 18 mov 0x18(%rsp),%eax
5f: ff c8 dec %eax
61: 83 f8 03 cmp $0x3,%eax
64: 77 24 ja 8a <_Z5test1v+0x8a>
66: ff 24 c5 00 00 00 00 jmpq *0x0(,%rax,8)
6d: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi
72: e8 00 00 00 00 callq 77 <_Z5test1v+0x77>
77: eb 09 jmp 82 <_Z5test1v+0x82>
79: 48 c7 44 24 08 00 00 movq $0x0,0x8(%rsp)
80: 00 00
82: c7 44 24 18 00 00 00 movl $0x0,0x18(%rsp)
89: 00
8a: 48 89 df mov %rbx,%rdi
8d: e8 00 00 00 00 callq 92 <_Z5test1v+0x92>
92: 66 66 66 66 66 2e 0f data32 data32 data32 data32 nopw
%cs:0x0(%rax,%rax,1)
99: 1f 84 00 00 00 00 00
This is highly unfortunate, because monad is the base class of future, and
therefore this promise-future sequence:
extern BOOST_SPINLOCK_NOINLINE int test1()
{
using namespace boost::spinlock::lightweight_futures;
promise<int> p;
p.set_value(5);
future<int> f(p.get_future());
return f.get();
}
... which under GCC correctly turns into:
0000000000000010 <_Z5test1v>:
10: b8 05 00 00 00 mov $0x5,%eax
15: c3 retq
... under clang 3.6 and 3.7 most unfortunately turns into:
0000000000000000 <_Z5test1v>:
0: 53 push %rbx
1: 48 83 ec 50 sub $0x50,%rsp
5: c7 44 24 40 00 00 00 movl $0x0,0x40(%rsp)
c: 00
d: c6 44 24 48 00 movb $0x0,0x48(%rsp)
12: c7 44 24 2c 05 00 00 movl $0x5,0x2c(%rsp)
19: 00
1a: 48 8d 5c 24 30 lea 0x30(%rsp),%rbx
1f: 48 8d 74 24 2c lea 0x2c(%rsp),%rsi
24: 48 89 df mov %rbx,%rdi
27: e8 00 00 00 00 callq 2c <_Z5test1v+0x2c>
2c: 48 8d 3c 24 lea (%rsp),%rdi
30: 48 89 de mov %rbx,%rsi
33: e8 00 00 00 00 callq 38 <_Z5test1v+0x38>
38: 48 8d 3c 24 lea (%rsp),%rdi
3c: e8 00 00 00 00 callq 41 <_Z5test1v+0x41>
41: 89 c3 mov %eax,%ebx
43: 48 8d 3c 24 lea (%rsp),%rdi
47: e8 00 00 00 00 callq 4c <_Z5test1v+0x4c>
4c: 48 8d 7c 24 30 lea 0x30(%rsp),%rdi
51: e8 00 00 00 00 callq 56 <_Z5test1v+0x56>
56: 89 d8 mov %ebx,%eax
58: 48 83 c4 50 add $0x50,%rsp
5c: 5b pop %rbx
5d: c3 retq
5e: 48 89 c3 mov %rax,%rbx
61: eb 0c jmp 6f <_Z5test1v+0x6f>
63: 48 89 c3 mov %rax,%rbx
66: 48 8d 3c 24 lea (%rsp),%rdi
6a: e8 00 00 00 00 callq 6f <_Z5test1v+0x6f>
6f: 48 8d 7c 24 30 lea 0x30(%rsp),%rdi
74: e8 00 00 00 00 callq 79 <_Z5test1v+0x79>
79: 48 89 df mov %rbx,%rdi
7c: e8 00 00 00 00 callq 81 <_Z5test1v+0x81>
81: 66 66 66 66 66 66 2e data32 data32 data32 data32 data32 nopw
%cs:0x0(%rax,%rax,1)
88: 0f 1f 84 00 00 00 00
8f: 00
... which I should imagine would be quite a performance penalty.
I asked clang to -save-temps, and the dump for all the unit tests along with
the command options used can be found at:
clang 3.5:
<a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__drive.google.com_file_d_0B5QDPUNHLpKMcTJXd2lqZ1lKNTA_view-3Fusp-3Dsharing&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=J0W9s3zM2PkKMxPPGy_B_oStmCJHxu5rzolkd1I_pR8&s=oMpAkEg7aOw_4eUqihn1VCm9-qKe6LZVKmz_nDMs9MI&e=">https://drive.google.com/file/d/0B5QDPUNHLpKMcTJXd2lqZ1lKNTA/view?usp=sharing</a>
clang 3.6:
<a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__drive.google.com_file_d_0B5QDPUNHLpKMQ1g1SU9WbUJiWWc_view-3Fusp-3Dsharing&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=J0W9s3zM2PkKMxPPGy_B_oStmCJHxu5rzolkd1I_pR8&s=Vsgup13S-Zl7O2DrNutoNC6ZBL-MxF_mM9o_fi377C0&e=">https://drive.google.com/file/d/0B5QDPUNHLpKMQ1g1SU9WbUJiWWc/view?usp=sharing</a>
clang 3.7:
<a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__drive.google.com_file_d_0B5QDPUNHLpKMaUNNWXhqSi1oM3c_view-3Fusp-3Dsharing&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=J0W9s3zM2PkKMxPPGy_B_oStmCJHxu5rzolkd1I_pR8&s=WoD7Xe94RnEzmKe7akBIIl7q4R00xV70T0-NDQ39gEU&e=">https://drive.google.com/file/d/0B5QDPUNHLpKMaUNNWXhqSi1oM3c/view?usp=sharing</a>
Niall</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>