<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Missed Optimization: CLZ Loop -> CLZ instruction"
   href="https://bugs.llvm.org/show_bug.cgi?id=48404">48404</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Missed Optimization: CLZ Loop -> CLZ instruction
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Scalar Optimizations
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>haneef503@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Both of these are equivalent (the latter uses a builtin compiler intrinsic),
but only the latter emits the platform native CLZ instruction:

C(++):

unsigned clz_a (unsigned x) {
    unsigned w = sizeof (x) * 8;
    while (x) {
        w--;
        x >>= 1;
    }

    return w;
}

unsigned clz_b (unsigned x) {
    return __builtin_clzll (x);
}

x86-64 (clang -O3 -march=skylake) Assembly:

clz_a(unsigned int):                              # @clz_a(unsigned int)
        mov     eax, 32
        test    edi, edi
        je      .LBB0_2
.LBB0_1:                                # =>This Inner Loop Header: Depth=1
        dec     eax
        shr     edi
        jne     .LBB0_1
.LBB0_2:
        ret
clz_b(unsigned int):                              # @clz_b(unsigned int)
        mov     eax, edi
        lzcnt   rax, rax
        ret

A potential issue is that the intrinsic __builtin_clz(x) comes from GCC, where
they've stated that the behavior of the intrinsic is UB iff (x == 0), likely
since the old x86 BSR instruction yields an undefined result iff (x == 0).
However, it should be straightforward for LLVM to make __builtin_clz(x) well
defined even when (x == 0) by simply emitting a CMOV with the appropriate with
in conjunction when compiling for x86 uarchs too old to have LZCNT support.
AFAICT all other platforms have well defined results for all inputs to their
CLZ instructions.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>