[LLVMbugs] [Bug 23652] New: Next gen non-allocating constexpr-folding future-promise does not optimise well on clang
bugzilla-daemon at llvm.org
bugzilla-daemon at llvm.org
Mon May 25 17:03:49 PDT 2015
https://llvm.org/bugs/show_bug.cgi?id=23652
Bug ID: 23652
Summary: Next gen non-allocating constexpr-folding
future-promise does not optimise well on clang
Product: clang
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P
Component: C++14
Assignee: unassignedclangbugs at nondot.org
Reporter: s_bugzilla at nedprod.com
CC: llvmbugs at cs.uiuc.edu
Classification: Unclassified
As part of working on next generation non-allocating constexpr-folding
future-promises for the Boost.Thread rewrite (and with the hope these become
the next STL future-promises), clang currently does not perform ideally as
compared to GCC.
I have spoken with Chandler Carruth about these at C++ Now, and he may chime in
here about the importance of clang matching GCC in performance with these. I am
also raising these with colleagues on the MSVC team, as poor old VS2015
generates about 3000 opcodes for the last example :(.
Anyway as a quick summary, under these next-gen future-promises this sequence:
extern BOOST_SPINLOCK_NOINLINE int test1()
{
using namespace boost::spinlock::lightweight_futures;
monad<int, true> m(5);
return m.get();
}
... should turn into:
0000000000000000 <_Z5test1v>:
0: b8 05 00 00 00 mov $0x5,%eax
5: c3 retq
... and indeed does under GCC, but under clang 3.6 and 3.7 turns into:
0000000000000000 <_Z5test1v>:
0: 53 push %rbx
1: 48 83 ec 20 sub $0x20,%rsp
5: c7 44 24 08 05 00 00 movl $0x5,0x8(%rsp)
c: 00
d: c7 44 24 18 01 00 00 movl $0x1,0x18(%rsp)
14: 00
15: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi
1a: e8 00 00 00 00 callq 1f <_Z5test1v+0x1f>
1f: 89 c3 mov %eax,%ebx
21: 8b 44 24 18 mov 0x18(%rsp),%eax
25: ff c8 dec %eax
27: 83 f8 03 cmp $0x3,%eax
2a: 77 24 ja 50 <_Z5test1v+0x50>
2c: ff 24 c5 00 00 00 00 jmpq *0x0(,%rax,8)
33: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi
38: e8 00 00 00 00 callq 3d <_Z5test1v+0x3d>
3d: eb 09 jmp 48 <_Z5test1v+0x48>
3f: 48 c7 44 24 08 00 00 movq $0x0,0x8(%rsp)
46: 00 00
48: c7 44 24 18 00 00 00 movl $0x0,0x18(%rsp)
4f: 00
50: 89 d8 mov %ebx,%eax
52: 48 83 c4 20 add $0x20,%rsp
56: 5b pop %rbx
57: c3 retq
58: 48 89 c3 mov %rax,%rbx
5b: 8b 44 24 18 mov 0x18(%rsp),%eax
5f: ff c8 dec %eax
61: 83 f8 03 cmp $0x3,%eax
64: 77 24 ja 8a <_Z5test1v+0x8a>
66: ff 24 c5 00 00 00 00 jmpq *0x0(,%rax,8)
6d: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi
72: e8 00 00 00 00 callq 77 <_Z5test1v+0x77>
77: eb 09 jmp 82 <_Z5test1v+0x82>
79: 48 c7 44 24 08 00 00 movq $0x0,0x8(%rsp)
80: 00 00
82: c7 44 24 18 00 00 00 movl $0x0,0x18(%rsp)
89: 00
8a: 48 89 df mov %rbx,%rdi
8d: e8 00 00 00 00 callq 92 <_Z5test1v+0x92>
92: 66 66 66 66 66 2e 0f data32 data32 data32 data32 nopw
%cs:0x0(%rax,%rax,1)
99: 1f 84 00 00 00 00 00
This is highly unfortunate, because monad is the base class of future, and
therefore this promise-future sequence:
extern BOOST_SPINLOCK_NOINLINE int test1()
{
using namespace boost::spinlock::lightweight_futures;
promise<int> p;
p.set_value(5);
future<int> f(p.get_future());
return f.get();
}
... which under GCC correctly turns into:
0000000000000010 <_Z5test1v>:
10: b8 05 00 00 00 mov $0x5,%eax
15: c3 retq
... under clang 3.6 and 3.7 most unfortunately turns into:
0000000000000000 <_Z5test1v>:
0: 53 push %rbx
1: 48 83 ec 50 sub $0x50,%rsp
5: c7 44 24 40 00 00 00 movl $0x0,0x40(%rsp)
c: 00
d: c6 44 24 48 00 movb $0x0,0x48(%rsp)
12: c7 44 24 2c 05 00 00 movl $0x5,0x2c(%rsp)
19: 00
1a: 48 8d 5c 24 30 lea 0x30(%rsp),%rbx
1f: 48 8d 74 24 2c lea 0x2c(%rsp),%rsi
24: 48 89 df mov %rbx,%rdi
27: e8 00 00 00 00 callq 2c <_Z5test1v+0x2c>
2c: 48 8d 3c 24 lea (%rsp),%rdi
30: 48 89 de mov %rbx,%rsi
33: e8 00 00 00 00 callq 38 <_Z5test1v+0x38>
38: 48 8d 3c 24 lea (%rsp),%rdi
3c: e8 00 00 00 00 callq 41 <_Z5test1v+0x41>
41: 89 c3 mov %eax,%ebx
43: 48 8d 3c 24 lea (%rsp),%rdi
47: e8 00 00 00 00 callq 4c <_Z5test1v+0x4c>
4c: 48 8d 7c 24 30 lea 0x30(%rsp),%rdi
51: e8 00 00 00 00 callq 56 <_Z5test1v+0x56>
56: 89 d8 mov %ebx,%eax
58: 48 83 c4 50 add $0x50,%rsp
5c: 5b pop %rbx
5d: c3 retq
5e: 48 89 c3 mov %rax,%rbx
61: eb 0c jmp 6f <_Z5test1v+0x6f>
63: 48 89 c3 mov %rax,%rbx
66: 48 8d 3c 24 lea (%rsp),%rdi
6a: e8 00 00 00 00 callq 6f <_Z5test1v+0x6f>
6f: 48 8d 7c 24 30 lea 0x30(%rsp),%rdi
74: e8 00 00 00 00 callq 79 <_Z5test1v+0x79>
79: 48 89 df mov %rbx,%rdi
7c: e8 00 00 00 00 callq 81 <_Z5test1v+0x81>
81: 66 66 66 66 66 66 2e data32 data32 data32 data32 data32 nopw
%cs:0x0(%rax,%rax,1)
88: 0f 1f 84 00 00 00 00
8f: 00
... which I should imagine would be quite a performance penalty.
I asked clang to -save-temps, and the dump for all the unit tests along with
the command options used can be found at:
clang 3.5:
https://drive.google.com/file/d/0B5QDPUNHLpKMcTJXd2lqZ1lKNTA/view?usp=sharing
clang 3.6:
https://drive.google.com/file/d/0B5QDPUNHLpKMQ1g1SU9WbUJiWWc/view?usp=sharing
clang 3.7:
https://drive.google.com/file/d/0B5QDPUNHLpKMaUNNWXhqSi1oM3c/view?usp=sharing
Niall
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20150526/6ca881a9/attachment.html>
More information about the llvm-bugs
mailing list