<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - False dependency in x86 popcnt instruction unknown to llvm causes slow code"
   href="https://bugs.llvm.org/show_bug.cgi?id=34936">34936</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>False dependency in x86 popcnt instruction unknown to llvm causes slow code
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: X86
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>justin.lebar@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>clang/LLVM at head seems to be affected by the bug described here:
<a href="https://stackoverflow.com/questions/25078285/replacing-a-32-bit-loop-count-variable-with-64-bit-introduces-crazy-performance/25089720#25089720">https://stackoverflow.com/questions/25078285/replacing-a-32-bit-loop-count-variable-with-64-bit-introduces-crazy-performance/25089720#25089720</a>

The etiology established in the SO post is that in the hardware, "popcount dst,
src" has a false dependency on dst.  If the compiler isn't aware of this, it
makes bad decisions during register assignment.

$ curl
<a href="https://gist.githubusercontent.com/anonymous/31cb15567b89f461534fcb97957b5">https://gist.githubusercontent.com/anonymous/31cb15567b89f461534fcb97957b5</a>
369/raw/ec4705c992f355258c292da5ba21ca0c7abaa119/- | clang++ -O3 -march=haswell
--std=c++11 -x c++ - -o test && ./test 1

On a Haswell machine, I get

unsigned        41959360000     0.592057 sec    17.7107 GB/s
uint64_t        41959360000     0.823331 sec    12.7358 GB/s

which exhibits the bug by being significantly slower in the case where the loop
induction variable is uint64_t.

Disassembly is at
<a href="https://gist.github.com/anonymous/47496363b7a4f15ffd57038492afb3e3">https://gist.github.com/anonymous/47496363b7a4f15ffd57038492afb3e3</a> -- based on
my (nonexpert) analysis, it seems plausible that the etiology from SO applies
here.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>