<html>
    <head>
      <base href="http://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - bit-scan-forward / count-trailing-zeros loop not recognized"
   href="http://llvm.org/bugs/show_bug.cgi?id=17128">17128</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>bit-scan-forward / count-trailing-zeros loop not recognized
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Scalar Optimizations
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>kkhoo@perfwizard.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvmbugs@cs.uiuc.edu
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>$ ./clang -v
clang version 3.4 (trunk 189776)
Target: x86_64-apple-darwin11.4.2
Thread model: posix

$ cat tzcnt.c 
int tzcnt(int x) {
   int count = 0;
   int i = 0;
   while ( i<32 && (((x >> i) & 0x1) == 0)) {
      i++;
      count++;
   }
   return count;
}

$ ./clang -S -O3 -fomit-frame-pointer -march=core-avx2 -o /dev/stdout tzcnt.c 
    .section    __TEXT,__text,regular,pure_instructions
    .globl    _tzcnt
    .align    4, 0x90
_tzcnt:                                 ## @tzcnt
    .cfi_startproc
## BB#0:                                ## %entry
    xorl    %eax, %eax
    .align    4, 0x90
LBB0_1:                                 ## %land.rhs
                                        ## =>This Inner Loop Header: Depth=1
    btl    %eax, %edi
    jb    LBB0_3
## BB#2:                                ## %while.body
                                        ##   in Loop: Header=BB0_1 Depth=1
    incl    %eax
    cmpl    $32, %eax
    jl    LBB0_1
LBB0_3:                                 ## %while.end
    ret

...

On CPUs with the BMI feature, I was hoping this loop would generate the 'tzcnt'
instruction:

tzcnt %edi, %eax

On x86 CPUs without BMI, this loop could also be implemented with the 'bsf'
instruction with a leading check for a zero input value. In the case of zero
input, the compiler would have to return '32' because the hardware doesn't.

According to Intel's Volume 2 ISA reference:
"The key difference between TZCNT and BSF instruction is that TZCNT provides
operand size as output when source operand is zero while in the case of BSF
instruction, if source operand is zero, the content of destination operand are
undefined. On processors that do not support TZCNT, the instruction byte
encoding is executed as BSF."</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>