[llvm-bugs] [Bug 34843] New: Suboptimal code generation for __builtin_ctz(ll)
llvm-bugs at lists.llvm.org
Thu Oct 5 02:55:50 PDT 2017
Bug ID: 34843
Summary: Suboptimal code generation for __builtin_ctz(ll)
Component: LLVM Codegen
Assignee: unassignedclangbugs at nondot.org
Reporter: gcp at sjeng.org
CC: llvm-bugs at lists.llvm.org
Right now, when no specific arch target is set, the builtin
__builtin_ctz (and long, long long variants)
will generate a bsf instruction.
This is suboptimal for AMD machines, which can do a TZCNT much faster than they
can do a BSF. Due to the way TZCNT is encoded, it is equal to a REP BSF, so it
is in fact "backwards compatible" as long as the different behavior for a 0 is
fine. And it is, because __builtin_ctz has undefined behavior for 0 (which is
why it can use BSF in the first place).
On Intel hardware, either way is equally fast, so for a generic target it makes
sense to deal with the AMD case and encode the intrinsic as REP BSF/TZNCT.
At least GCC 4.8 and later are able to do this optimization and generate a REP
BSF for their generic target. Clang fails to do so. (It does generate TZCNT
Of note in this snippet is also that newer GCC adds a XOR ESI, ESI before the
REP BSF. So there may be a false dependency issue in some CPUs.
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-bugs