[PATCH][X86] __builtin_ctz/clz sometimed defined for zero input

Robinson, Paul Paul_Robinson at playstation.sony.com
Mon Oct 27 18:17:25 PDT 2014


Yes, the second version of my patch provides new builtins that we can define however we please, and leaves the originals alone.
--paulr

From: cfe-commits-bounces at cs.uiuc.edu [mailto:cfe-commits-bounces at cs.uiuc.edu] On Behalf Of Richard Smith
Sent: Monday, October 27, 2014 5:20 PM
To: Chandler Carruth
Cc: cfe-commits at cs.uiuc.edu
Subject: Re: [PATCH][X86] __builtin_ctz/clz sometimed defined for zero input

On Sun, Oct 26, 2014 at 10:24 PM, Chandler Carruth <chandlerc at google.com<mailto:chandlerc at google.com>> wrote:
Sorry for the late replies, but as the last person to work on the CLZ / CTZ stuff, I have some small opinions here. =]

On Sun, Oct 26, 2014 at 9:52 PM, David Majnemer <david.majnemer at gmail.com<mailto:david.majnemer at gmail.com>> wrote:
I think Paul's patch is fine.

I firmly disagree.


Having `__builtin_ctz` asking the target if `isCLZForZeroUndef` seems busted and this patch seems to fix that just fine.

Yes, that is busted and we should fix that aspect regardless. But changing the behavior on x86 seems actively harmful to portability.


So, let's look at the history.

I went and taught LLVM to produce optimal code in cases where undef-at-zero was acceptable and only instructions with undef-at-zero were available. In order to generate reasonable code for these scenarios, Clang and LLVM had to respect the contract specified for the GCC __builtin_clz and __builtin_ctz. That spec[1] *very* clearly states that if the input is zero the result is undefined.

1: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html

However, after this Bob changed ARM to not follow this spec citing a radar only in r149086. I objected at the time, and continue to feel this is the *wrong* approach for ARM. We should not be deviating from the clear, documented, and widely followed spec here. However, I had no real basis for arguing about the behavior of ARM at the time, and I doubt that I have much say in the behavior there even now. =] That said, on x86, I am firmly opposed to deviating from the clear and unambiguous spec.

If folks want a builtin to access unambiguously defined-at-zero CLZ or CTZ behavior, we should define *new* builtins which have a clear spec here. However, I'm unpersuaded that this is better or more portable. The alternative code seems quite palatable: (x == 0 ? 32 : __builtin_clz(x))

If LLVM fails to turn this into the maximally efficient code on some architecture, then much as others have suggested, we should enhance LLVM.

I agree with Chandler on all points here.

__builtin_clz and __builtin_ctz is clearly and unambiguously defined by GCC's specification for them, and we should not be providing additional definedness  guarantees. If we want a defined-for-zero-input form of these intrinsics, they should have a different name (or an additional argument, or be written as "x == 0 ? 32 : __builtin_clz(x)" or similar).

It is a major feature of Clang that we try not to lock people in by providing incompatible extensions.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20141028/7b286db0/attachment.html>


More information about the cfe-commits mailing list