[llvm-commits] PATCH: Enable direct selection of bsf and bsr instructions for cttz and ctlz with zero-undef behavior
scanon at apple.com
Thu Dec 15 07:52:29 PST 2011
Just for the record, this is in no way unique to AMD. Agner Fog's tables list BSF/BSR as 10 µops/16 cycles on Atom as well. BSF is a hazard to be avoided on an unknown x86 processor.
On Dec 14, 2011, at 9:13 PM, Chandler Carruth wrote:
> Gentle ping on this patch...
> On Tue, Dec 13, 2011 at 9:56 AM, Erik Olofsson <Erik.Olofsson at hansoft.se> wrote:
> Just a note, BSR/BSF might have some performance implications depending on which CPU is running the code. See http://www.realworldtech.com/beta/forums/index.cfm?action=detail&id=82507&threadid=82344&roomid=2
> It would probably be safest to just use lzcnt/tzcnt when available unless you know which CPU you are optimizing for.
> I'm well aware of the problems with BSR/BSF on AMD chips. On Intel chips, they don't have such drawbacks though, and may in some cases be preferable. Still, as you say, we need to pay attention to what chip is targeted. That's part of why this patch *doesn't* tackle that problem. =] Eventually, I hope to take advantage of these architectural details, but I'm starting simple.
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits