<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - Code generated by clang-9 runs slower than those with previous versions (clang-8/clang-7)."

   href="https://bugs.llvm.org/show_bug.cgi?id=43724">43724</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Code generated by clang-9 runs slower than those with previous versions (clang-8/clang-7).

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>clang

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>9.0

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Linux

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>C11

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedclangbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>wuxb45@gmail.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>blitzrakete@gmail.com, dgregor@apple.com, erik.pilkington@gmail.com, llvm-bugs@lists.llvm.org, richard-llvm@metafoo.co.uk

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Performance regression observed after upgraded to clang-9 on Archlinux

x86_64/Broadwell (all packages are up-to-date).

The regression has been bisected to the scope of a rwlock implementation with

c11 atomic operations. Code that can reproduce the regression on my a few

Broadwell servers can be found here: <a href="https://github.com/wuxb45/atomictest">https://github.com/wuxb45/atomictest</a>

The most apparent difference is that clang-9 generates "lock incl" for +1

operations and "lock xadd" for +n operations. clang-8 and clang-7 all generate

"lock addl".

The rest of the assembly looks similar but still show some differences on the

ordering of some if-else blocks (just a few places). It is possible that the

branch-prediction are unluckily affected by the changed layout. However, the

code in the above link is excerpted from a larger code base after the

regression was observed. Both versions of the code shows the same behavior and

I still believe it's not _sustainable_ trying to mitigate this issue by

reordering some source code.

The regression has been consistently observed on Xeon E5 2697A v4 (HT off,

broadwell) and Xeon Gold 5120 (HT on, skylake). I have not tried it on other

CPUs.

More details are provided in that github link. Besides, I'm willing to move

some texts here or provide anything else that could be useful.

Appreciate it.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>