<html>
    <head>
      <base href="https://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW " title="NEW --- - Next gen non-allocating constexpr-folding future-promise does not optimise well on clang" href="https://urldefense.proofpoint.com/v2/url?u=https-3A__llvm.org_bugs_show-5Fbug.cgi-3Fid-3D23652&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=J0W9s3zM2PkKMxPPGy_B_oStmCJHxu5rzolkd1I_pR8&s=Cipbf3s3bPcUr2HEJ-dkyQk0UPNbRjJZfXmRIbQT1Cw&e=">23652</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Next gen non-allocating constexpr-folding future-promise does not optimise well on clang
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>clang
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>C++14
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedclangbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>s_bugzilla@nedprod.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvmbugs@cs.uiuc.edu
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>As part of working on next generation non-allocating constexpr-folding
future-promises for the Boost.Thread rewrite (and with the hope these become
the next STL future-promises), clang currently does not perform ideally as
compared to GCC.

I have spoken with Chandler Carruth about these at C++ Now, and he may chime in
here about the importance of clang matching GCC in performance with these. I am
also raising these with colleagues on the MSVC team, as poor old VS2015
generates about 3000 opcodes for the last example :(.

Anyway as a quick summary, under these next-gen future-promises this sequence:

extern BOOST_SPINLOCK_NOINLINE int test1()
{
  using namespace boost::spinlock::lightweight_futures;
  monad<int, true> m(5);
  return m.get();
}

... should turn into:

0000000000000000 <_Z5test1v>:
   0:    b8 05 00 00 00           mov    $0x5,%eax
   5:    c3                       retq   

... and indeed does under GCC, but under clang 3.6 and 3.7 turns into:

0000000000000000 <_Z5test1v>:
   0:    53                       push   %rbx
   1:    48 83 ec 20              sub    $0x20,%rsp
   5:    c7 44 24 08 05 00 00     movl   $0x5,0x8(%rsp)
   c:    00 
   d:    c7 44 24 18 01 00 00     movl   $0x1,0x18(%rsp)
  14:    00 
  15:    48 8d 7c 24 08           lea    0x8(%rsp),%rdi
  1a:    e8 00 00 00 00           callq  1f <_Z5test1v+0x1f>
  1f:    89 c3                    mov    %eax,%ebx
  21:    8b 44 24 18              mov    0x18(%rsp),%eax
  25:    ff c8                    dec    %eax
  27:    83 f8 03                 cmp    $0x3,%eax
  2a:    77 24                    ja     50 <_Z5test1v+0x50>
  2c:    ff 24 c5 00 00 00 00     jmpq   *0x0(,%rax,8)
  33:    48 8d 7c 24 08           lea    0x8(%rsp),%rdi
  38:    e8 00 00 00 00           callq  3d <_Z5test1v+0x3d>
  3d:    eb 09                    jmp    48 <_Z5test1v+0x48>
  3f:    48 c7 44 24 08 00 00     movq   $0x0,0x8(%rsp)
  46:    00 00 
  48:    c7 44 24 18 00 00 00     movl   $0x0,0x18(%rsp)
  4f:    00 
  50:    89 d8                    mov    %ebx,%eax
  52:    48 83 c4 20              add    $0x20,%rsp
  56:    5b                       pop    %rbx
  57:    c3                       retq   
  58:    48 89 c3                 mov    %rax,%rbx
  5b:    8b 44 24 18              mov    0x18(%rsp),%eax
  5f:    ff c8                    dec    %eax
  61:    83 f8 03                 cmp    $0x3,%eax
  64:    77 24                    ja     8a <_Z5test1v+0x8a>
  66:    ff 24 c5 00 00 00 00     jmpq   *0x0(,%rax,8)
  6d:    48 8d 7c 24 08           lea    0x8(%rsp),%rdi
  72:    e8 00 00 00 00           callq  77 <_Z5test1v+0x77>
  77:    eb 09                    jmp    82 <_Z5test1v+0x82>
  79:    48 c7 44 24 08 00 00     movq   $0x0,0x8(%rsp)
  80:    00 00 
  82:    c7 44 24 18 00 00 00     movl   $0x0,0x18(%rsp)
  89:    00 
  8a:    48 89 df                 mov    %rbx,%rdi
  8d:    e8 00 00 00 00           callq  92 <_Z5test1v+0x92>
  92:    66 66 66 66 66 2e 0f     data32 data32 data32 data32 nopw
%cs:0x0(%rax,%rax,1)
  99:    1f 84 00 00 00 00 00 

This is highly unfortunate, because monad is the base class of future, and
therefore this promise-future sequence:

extern BOOST_SPINLOCK_NOINLINE int test1()
{
  using namespace boost::spinlock::lightweight_futures;
  promise<int> p;
  p.set_value(5);
  future<int> f(p.get_future());
  return f.get();
}

... which under GCC correctly turns into:

0000000000000010 <_Z5test1v>:
  10:    b8 05 00 00 00           mov    $0x5,%eax
  15:    c3                       retq   

... under clang 3.6 and 3.7 most unfortunately turns into:

0000000000000000 <_Z5test1v>:
   0:    53                       push   %rbx
   1:    48 83 ec 50              sub    $0x50,%rsp
   5:    c7 44 24 40 00 00 00     movl   $0x0,0x40(%rsp)
   c:    00 
   d:    c6 44 24 48 00           movb   $0x0,0x48(%rsp)
  12:    c7 44 24 2c 05 00 00     movl   $0x5,0x2c(%rsp)
  19:    00 
  1a:    48 8d 5c 24 30           lea    0x30(%rsp),%rbx
  1f:    48 8d 74 24 2c           lea    0x2c(%rsp),%rsi
  24:    48 89 df                 mov    %rbx,%rdi
  27:    e8 00 00 00 00           callq  2c <_Z5test1v+0x2c>
  2c:    48 8d 3c 24              lea    (%rsp),%rdi
  30:    48 89 de                 mov    %rbx,%rsi
  33:    e8 00 00 00 00           callq  38 <_Z5test1v+0x38>
  38:    48 8d 3c 24              lea    (%rsp),%rdi
  3c:    e8 00 00 00 00           callq  41 <_Z5test1v+0x41>
  41:    89 c3                    mov    %eax,%ebx
  43:    48 8d 3c 24              lea    (%rsp),%rdi
  47:    e8 00 00 00 00           callq  4c <_Z5test1v+0x4c>
  4c:    48 8d 7c 24 30           lea    0x30(%rsp),%rdi
  51:    e8 00 00 00 00           callq  56 <_Z5test1v+0x56>
  56:    89 d8                    mov    %ebx,%eax
  58:    48 83 c4 50              add    $0x50,%rsp
  5c:    5b                       pop    %rbx
  5d:    c3                       retq   
  5e:    48 89 c3                 mov    %rax,%rbx
  61:    eb 0c                    jmp    6f <_Z5test1v+0x6f>
  63:    48 89 c3                 mov    %rax,%rbx
  66:    48 8d 3c 24              lea    (%rsp),%rdi
  6a:    e8 00 00 00 00           callq  6f <_Z5test1v+0x6f>
  6f:    48 8d 7c 24 30           lea    0x30(%rsp),%rdi
  74:    e8 00 00 00 00           callq  79 <_Z5test1v+0x79>
  79:    48 89 df                 mov    %rbx,%rdi
  7c:    e8 00 00 00 00           callq  81 <_Z5test1v+0x81>
  81:    66 66 66 66 66 66 2e     data32 data32 data32 data32 data32 nopw
%cs:0x0(%rax,%rax,1)
  88:    0f 1f 84 00 00 00 00 
  8f:    00 

... which I should imagine would be quite a performance penalty.

I asked clang to -save-temps, and the dump for all the unit tests along with
the command options used can be found at:

clang 3.5:

<a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__drive.google.com_file_d_0B5QDPUNHLpKMcTJXd2lqZ1lKNTA_view-3Fusp-3Dsharing&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=J0W9s3zM2PkKMxPPGy_B_oStmCJHxu5rzolkd1I_pR8&s=oMpAkEg7aOw_4eUqihn1VCm9-qKe6LZVKmz_nDMs9MI&e=">https://drive.google.com/file/d/0B5QDPUNHLpKMcTJXd2lqZ1lKNTA/view?usp=sharing</a>

clang 3.6:

<a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__drive.google.com_file_d_0B5QDPUNHLpKMQ1g1SU9WbUJiWWc_view-3Fusp-3Dsharing&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=J0W9s3zM2PkKMxPPGy_B_oStmCJHxu5rzolkd1I_pR8&s=Vsgup13S-Zl7O2DrNutoNC6ZBL-MxF_mM9o_fi377C0&e=">https://drive.google.com/file/d/0B5QDPUNHLpKMQ1g1SU9WbUJiWWc/view?usp=sharing</a>

clang 3.7:

<a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__drive.google.com_file_d_0B5QDPUNHLpKMaUNNWXhqSi1oM3c_view-3Fusp-3Dsharing&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=J0W9s3zM2PkKMxPPGy_B_oStmCJHxu5rzolkd1I_pR8&s=WoD7Xe94RnEzmKe7akBIIl7q4R00xV70T0-NDQ39gEU&e=">https://drive.google.com/file/d/0B5QDPUNHLpKMaUNNWXhqSi1oM3c/view?usp=sharing</a>

Niall</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>