[llvm-bugs] [Bug 34843] New: Suboptimal code generation for __builtin_ctz(ll)
via llvm-bugs
llvm-bugs at lists.llvm.org
Thu Oct 5 02:55:50 PDT 2017
https://bugs.llvm.org/show_bug.cgi?id=34843
Bug ID: 34843
Summary: Suboptimal code generation for __builtin_ctz(ll)
Product: clang
Version: 5.0
Hardware: PC
OS: Linux
Status: NEW
Severity: enhancement
Priority: P
Component: LLVM Codegen
Assignee: unassignedclangbugs at nondot.org
Reporter: gcp at sjeng.org
CC: llvm-bugs at lists.llvm.org
Right now, when no specific arch target is set, the builtin
__builtin_ctz (and long, long long variants)
will generate a bsf instruction.
This is suboptimal for AMD machines, which can do a TZCNT much faster than they
can do a BSF. Due to the way TZCNT is encoded, it is equal to a REP BSF, so it
is in fact "backwards compatible" as long as the different behavior for a 0 is
fine. And it is, because __builtin_ctz has undefined behavior for 0 (which is
why it can use BSF in the first place).
On Intel hardware, either way is equally fast, so for a generic target it makes
sense to deal with the AMD case and encode the intrinsic as REP BSF/TZNCT.
At least GCC 4.8 and later are able to do this optimization and generate a REP
BSF for their generic target. Clang fails to do so. (It does generate TZCNT
with -march=znver1)
Example snippet:
https://godbolt.org/g/eXU6xf
Of note in this snippet is also that newer GCC adds a XOR ESI, ESI before the
REP BSF. So there may be a false dependency issue in some CPUs.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20171005/7385cd90/attachment.html>
More information about the llvm-bugs
mailing list