[LLVMbugs] [Bug 17128] New: bit-scan-forward / count-trailing-zeros loop not recognized

bugzilla-daemon at llvm.org bugzilla-daemon at llvm.org
Fri Sep 6 11:50:52 PDT 2013


http://llvm.org/bugs/show_bug.cgi?id=17128

            Bug ID: 17128
           Summary: bit-scan-forward / count-trailing-zeros loop not
                    recognized
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: Scalar Optimizations
          Assignee: unassignedbugs at nondot.org
          Reporter: kkhoo at perfwizard.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

$ ./clang -v
clang version 3.4 (trunk 189776)
Target: x86_64-apple-darwin11.4.2
Thread model: posix

$ cat tzcnt.c 
int tzcnt(int x) {
   int count = 0;
   int i = 0;
   while ( i<32 && (((x >> i) & 0x1) == 0)) {
      i++;
      count++;
   }
   return count;
}

$ ./clang -S -O3 -fomit-frame-pointer -march=core-avx2 -o /dev/stdout tzcnt.c 
    .section    __TEXT,__text,regular,pure_instructions
    .globl    _tzcnt
    .align    4, 0x90
_tzcnt:                                 ## @tzcnt
    .cfi_startproc
## BB#0:                                ## %entry
    xorl    %eax, %eax
    .align    4, 0x90
LBB0_1:                                 ## %land.rhs
                                        ## =>This Inner Loop Header: Depth=1
    btl    %eax, %edi
    jb    LBB0_3
## BB#2:                                ## %while.body
                                        ##   in Loop: Header=BB0_1 Depth=1
    incl    %eax
    cmpl    $32, %eax
    jl    LBB0_1
LBB0_3:                                 ## %while.end
    ret

...

On CPUs with the BMI feature, I was hoping this loop would generate the 'tzcnt'
instruction:

tzcnt %edi, %eax

On x86 CPUs without BMI, this loop could also be implemented with the 'bsf'
instruction with a leading check for a zero input value. In the case of zero
input, the compiler would have to return '32' because the hardware doesn't.

According to Intel's Volume 2 ISA reference:
"The key difference between TZCNT and BSF instruction is that TZCNT provides
operand size as output when source operand is zero while in the case of BSF
instruction, if source operand is zero, the content of destination operand are
undefined. On processors that do not support TZCNT, the instruction byte
encoding is executed as BSF."

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20130906/36058ec1/attachment.html>


More information about the llvm-bugs mailing list