<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Significant performance regression with r291800 ("Tune bypassing of slow division for Intel CPUs")"
   href="https://bugs.llvm.org/show_bug.cgi?id=35226">35226</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Significant performance regression with r291800 ("Tune bypassing of slow division for Intel CPUs")
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: X86
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>atdt@google.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>rL291800 (<a href="https://reviews.llvm.org/rL291800">https://reviews.llvm.org/rL291800</a>) took an optimization for lowering
64-bit division to 32-bit and enabled it on all Intel big cores, starting with
Sandy Bridge. This change is associated with a significant regression in an
internal, search-related benchmark, when compiled for -march=haswell.

In the differential revision (<a href="https://reviews.llvm.org/D28196">https://reviews.llvm.org/D28196</a>), the reviewer
pointed out that the fact that the latency/throughput of 64-bit division falls
along a range suggests that this optimization may already be done in hardware.
Do we know whether this is true or this is true? Additionally, is it possible
that improvements in the latency and throughput of division and remainder
operations on recent big core Intel CPUs render this optimization obsolete?</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>