[PATCH][X86] __builtin_ctz/clz sometimed defined for zero input

Mon Oct 27 07:00:26 PDT 2014

So the architecturally cleanest solution is to (a) eliminate the second parameter from the LLVM ctlz/cttz intrinsics, (b) eliminate Clang's lzcntintrin.h, (c) eliminate the isC{L,T}ZForZeroUndef predicates, (d) oh yeah make sure LLVM can optimize away the runtime zero check when appropriate on ARM, PPC, Mips, and sometimes X86.
Am I understanding all this correctly?
--paulr

From: Chandler Carruth [mailto:chandlerc at google.com]
Sent: Sunday, October 26, 2014 10:25 PM
To: David Majnemer
Cc: Robinson, Paul; cfe-commits at cs.uiuc.edu
Subject: Re: [PATCH][X86] __builtin_ctz/clz sometimed defined for zero input

Sorry for the late replies, but as the last person to work on the CLZ / CTZ stuff, I have some small opinions here. =]

On Sun, Oct 26, 2014 at 9:52 PM, David Majnemer <david.majnemer at gmail.com<mailto:david.majnemer at gmail.com>> wrote:
I think Paul's patch is fine.

I firmly disagree.

Having `__builtin_ctz` asking the target if `isCLZForZeroUndef` seems busted and this patch seems to fix that just fine.

Yes, that is busted and we should fix that aspect regardless. But changing the behavior on x86 seems actively harmful to portability.

So, let's look at the history.

I went and taught LLVM to produce optimal code in cases where undef-at-zero was acceptable and only instructions with undef-at-zero were available. In order to generate reasonable code for these scenarios, Clang and LLVM had to respect the contract specified for the GCC __builtin_clz and __builtin_ctz. That spec[1] *very* clearly states that if the input is zero the result is undefined.

1: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

However, after this Bob changed ARM to not follow this spec citing a radar only in r149086. I objected at the time, and continue to feel this is the *wrong* approach for ARM. We should not be deviating from the clear, documented, and widely followed spec here. However, I had no real basis for arguing about the behavior of ARM at the time, and I doubt that I have much say in the behavior there even now. =] That said, on x86, I am firmly opposed to deviating from the clear and unambiguous spec.

If folks want a builtin to access unambiguously defined-at-zero CLZ or CTZ behavior, we should define *new* builtins which have a clear spec here. However, I'm unpersuaded that this is better or more portable. The alternative code seems quite palatable: (x == 0 ? 32 : __builtin_clz(x))

If LLVM fails to turn this into the maximally efficient code on some architecture, then much as others have suggested, we should enhance LLVM.

-Chandler
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20141027/61b0aea6/attachment.html>