<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - Suboptimal cmov generation"

   href="https://bugs.llvm.org/show_bug.cgi?id=48760">48760</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Suboptimal cmov generation

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>new-bugs

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>enhancement

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>new bugs

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>henrik@gramner.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>htmldeveloper@gmail.com, llvm-bugs@lists.llvm.org

          </td>

        </tr></table>

      <p>

        <div>

        <pre>LLVM prefers to generate cmov instructions with as many flag dependencies as

possible, which is suboptimal. It should be doing the opposite.

For example, both of those functions are semantically identical:

unsigned a(unsigned x) { return x >  3 ? x : 0; }

unsigned b(unsigned x) { return x >= 4 ? x : 0; }

They both generate identical code:

    xor    eax, eax

    cmp    edi, 3

    cmova  eax, edi

    ret

The better code would be:

    xor    eax, eax

    cmp    edi, 4

    cmovae eax, edi

    ret

cmovae has a dependency on CF whereas cmova has dependencies on both CF and ZF.

Many (most?) x86 CPUs will execute cmov instructions with a single flag

dependency in a single µop, but splits them into two µops if there are multiple

flag dependencies.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>