[LLVMbugs] [Bug 23652] New: Next gen non-allocating constexpr-folding future-promise does not optimise well on clang

Mon May 25 17:03:49 PDT 2015

https://llvm.org/bugs/show_bug.cgi?id=23652

            Bug ID: 23652
           Summary: Next gen non-allocating constexpr-folding
                    future-promise does not optimise well on clang
           Product: clang
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: C++14
          Assignee: unassignedclangbugs at nondot.org
          Reporter: s_bugzilla at nedprod.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

As part of working on next generation non-allocating constexpr-folding
future-promises for the Boost.Thread rewrite (and with the hope these become
the next STL future-promises), clang currently does not perform ideally as
compared to GCC.

I have spoken with Chandler Carruth about these at C++ Now, and he may chime in
here about the importance of clang matching GCC in performance with these. I am
also raising these with colleagues on the MSVC team, as poor old VS2015
generates about 3000 opcodes for the last example :(.

Anyway as a quick summary, under these next-gen future-promises this sequence:

extern BOOST_SPINLOCK_NOINLINE int test1()
{
  using namespace boost::spinlock::lightweight_futures;
  monad<int, true> m(5);
  return m.get();
}

... should turn into:

0000000000000000 <_Z5test1v>:
   0:    b8 05 00 00 00           mov    $0x5,%eax
   5:    c3                       retq   

... and indeed does under GCC, but under clang 3.6 and 3.7 turns into:

0000000000000000 <_Z5test1v>:
   0:    53                       push   %rbx
   1:    48 83 ec 20              sub    $0x20,%rsp
   5:    c7 44 24 08 05 00 00     movl   $0x5,0x8(%rsp)
   c:    00 
   d:    c7 44 24 18 01 00 00     movl   $0x1,0x18(%rsp)
  14:    00 
  15:    48 8d 7c 24 08           lea    0x8(%rsp),%rdi
  1a:    e8 00 00 00 00           callq  1f <_Z5test1v+0x1f>
  1f:    89 c3                    mov    %eax,%ebx
  21:    8b 44 24 18              mov    0x18(%rsp),%eax
  25:    ff c8                    dec    %eax
  27:    83 f8 03                 cmp    $0x3,%eax
  2a:    77 24                    ja     50 <_Z5test1v+0x50>
  2c:    ff 24 c5 00 00 00 00     jmpq   *0x0(,%rax,8)
  33:    48 8d 7c 24 08           lea    0x8(%rsp),%rdi
  38:    e8 00 00 00 00           callq  3d <_Z5test1v+0x3d>
  3d:    eb 09                    jmp    48 <_Z5test1v+0x48>
  3f:    48 c7 44 24 08 00 00     movq   $0x0,0x8(%rsp)
  46:    00 00 
  48:    c7 44 24 18 00 00 00     movl   $0x0,0x18(%rsp)
  4f:    00 
  50:    89 d8                    mov    %ebx,%eax
  52:    48 83 c4 20              add    $0x20,%rsp
  56:    5b                       pop    %rbx
  57:    c3                       retq   
  58:    48 89 c3                 mov    %rax,%rbx
  5b:    8b 44 24 18              mov    0x18(%rsp),%eax
  5f:    ff c8                    dec    %eax
  61:    83 f8 03                 cmp    $0x3,%eax
  64:    77 24                    ja     8a <_Z5test1v+0x8a>
  66:    ff 24 c5 00 00 00 00     jmpq   *0x0(,%rax,8)
  6d:    48 8d 7c 24 08           lea    0x8(%rsp),%rdi
  72:    e8 00 00 00 00           callq  77 <_Z5test1v+0x77>
  77:    eb 09                    jmp    82 <_Z5test1v+0x82>
  79:    48 c7 44 24 08 00 00     movq   $0x0,0x8(%rsp)
  80:    00 00 
  82:    c7 44 24 18 00 00 00     movl   $0x0,0x18(%rsp)
  89:    00 
  8a:    48 89 df                 mov    %rbx,%rdi
  8d:    e8 00 00 00 00           callq  92 <_Z5test1v+0x92>
  92:    66 66 66 66 66 2e 0f     data32 data32 data32 data32 nopw
%cs:0x0(%rax,%rax,1)
  99:    1f 84 00 00 00 00 00 

This is highly unfortunate, because monad is the base class of future, and
therefore this promise-future sequence:

extern BOOST_SPINLOCK_NOINLINE int test1()
{
  using namespace boost::spinlock::lightweight_futures;
  promise<int> p;
  p.set_value(5);
  future<int> f(p.get_future());
  return f.get();
}

... which under GCC correctly turns into:

0000000000000010 <_Z5test1v>:
  10:    b8 05 00 00 00           mov    $0x5,%eax
  15:    c3                       retq   

... under clang 3.6 and 3.7 most unfortunately turns into:

0000000000000000 <_Z5test1v>:
   0:    53                       push   %rbx
   1:    48 83 ec 50              sub    $0x50,%rsp
   5:    c7 44 24 40 00 00 00     movl   $0x0,0x40(%rsp)
   c:    00 
   d:    c6 44 24 48 00           movb   $0x0,0x48(%rsp)
  12:    c7 44 24 2c 05 00 00     movl   $0x5,0x2c(%rsp)
  19:    00 
  1a:    48 8d 5c 24 30           lea    0x30(%rsp),%rbx
  1f:    48 8d 74 24 2c           lea    0x2c(%rsp),%rsi
  24:    48 89 df                 mov    %rbx,%rdi
  27:    e8 00 00 00 00           callq  2c <_Z5test1v+0x2c>
  2c:    48 8d 3c 24              lea    (%rsp),%rdi
  30:    48 89 de                 mov    %rbx,%rsi
  33:    e8 00 00 00 00           callq  38 <_Z5test1v+0x38>
  38:    48 8d 3c 24              lea    (%rsp),%rdi
  3c:    e8 00 00 00 00           callq  41 <_Z5test1v+0x41>
  41:    89 c3                    mov    %eax,%ebx
  43:    48 8d 3c 24              lea    (%rsp),%rdi
  47:    e8 00 00 00 00           callq  4c <_Z5test1v+0x4c>
  4c:    48 8d 7c 24 30           lea    0x30(%rsp),%rdi
  51:    e8 00 00 00 00           callq  56 <_Z5test1v+0x56>
  56:    89 d8                    mov    %ebx,%eax
  58:    48 83 c4 50              add    $0x50,%rsp
  5c:    5b                       pop    %rbx
  5d:    c3                       retq   
  5e:    48 89 c3                 mov    %rax,%rbx
  61:    eb 0c                    jmp    6f <_Z5test1v+0x6f>
  63:    48 89 c3                 mov    %rax,%rbx
  66:    48 8d 3c 24              lea    (%rsp),%rdi
  6a:    e8 00 00 00 00           callq  6f <_Z5test1v+0x6f>
  6f:    48 8d 7c 24 30           lea    0x30(%rsp),%rdi
  74:    e8 00 00 00 00           callq  79 <_Z5test1v+0x79>
  79:    48 89 df                 mov    %rbx,%rdi
  7c:    e8 00 00 00 00           callq  81 <_Z5test1v+0x81>
  81:    66 66 66 66 66 66 2e     data32 data32 data32 data32 data32 nopw
%cs:0x0(%rax,%rax,1)
  88:    0f 1f 84 00 00 00 00 
  8f:    00 

... which I should imagine would be quite a performance penalty.

I asked clang to -save-temps, and the dump for all the unit tests along with
the command options used can be found at:

clang 3.5:

https://drive.google.com/file/d/0B5QDPUNHLpKMcTJXd2lqZ1lKNTA/view?usp=sharing

clang 3.6:

https://drive.google.com/file/d/0B5QDPUNHLpKMQ1g1SU9WbUJiWWc/view?usp=sharing

clang 3.7:

https://drive.google.com/file/d/0B5QDPUNHLpKMaUNNWXhqSi1oM3c/view?usp=sharing

Niall

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20150526/6ca881a9/attachment.html>