[PATCH][X86] __builtin_ctz/clz sometimed defined for zero input

Richard Smith richard at metafoo.co.uk
Mon Oct 27 19:49:25 PDT 2014


On Mon, Oct 27, 2014 at 7:07 PM, Sean Silva <chisophugis at gmail.com> wrote:

> On Mon, Oct 27, 2014 at 5:20 PM, Richard Smith <richard at metafoo.co.uk>
> wrote:
>
>> On Sun, Oct 26, 2014 at 10:24 PM, Chandler Carruth <chandlerc at google.com>
>> wrote:
>>
>>> Sorry for the late replies, but as the last person to work on the CLZ /
>>> CTZ stuff, I have some small opinions here. =]
>>>
>>> On Sun, Oct 26, 2014 at 9:52 PM, David Majnemer <
>>> david.majnemer at gmail.com> wrote:
>>>
>>>> I think Paul's patch is fine.
>>>>
>>>
>>> I firmly disagree.
>>>
>>>
>>>>
>>>> Having `__builtin_ctz` asking the target if `isCLZForZeroUndef` seems
>>>> busted and this patch seems to fix that just fine.
>>>>
>>>
>>> Yes, that is busted and we should fix that aspect regardless. But
>>> changing the behavior on x86 seems actively harmful to portability.
>>>
>>>
>>> So, let's look at the history.
>>>
>>> I went and taught LLVM to produce optimal code in cases where
>>> undef-at-zero was acceptable and only instructions with undef-at-zero were
>>> available. In order to generate reasonable code for these scenarios, Clang
>>> and LLVM had to respect the contract specified for the GCC __builtin_clz
>>> and __builtin_ctz. That spec[1] *very* clearly states that if the input is
>>> zero the result is undefined.
>>>
>>> 1: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
>>>
>>> However, after this Bob changed ARM to not follow this spec citing a
>>> radar only in r149086. I objected at the time, and continue to feel this is
>>> the *wrong* approach for ARM. We should not be deviating from the clear,
>>> documented, and widely followed spec here. However, I had no real basis for
>>> arguing about the behavior of ARM at the time, and I doubt that I have much
>>> say in the behavior there even now. =] That said, on x86, I am firmly
>>> opposed to deviating from the clear and unambiguous spec.
>>>
>>> If folks want a builtin to access unambiguously defined-at-zero CLZ or
>>> CTZ behavior, we should define *new* builtins which have a clear spec here.
>>> However, I'm unpersuaded that this is better or more portable. The
>>> alternative code seems quite palatable: (x == 0 ? 32 : __builtin_clz(x))
>>>
>>> If LLVM fails to turn this into the maximally efficient code on some
>>> architecture, then much as others have suggested, we should enhance LLVM.
>>>
>>
>> I agree with Chandler on all points here.
>>
>> __builtin_clz and __builtin_ctz is clearly and unambiguously defined by
>> GCC's specification for them, and we should not be providing additional
>> definedness  guarantees. If we want a defined-for-zero-input form of these
>> intrinsics, they should have a different name (or an additional argument,
>> or be written as "x == 0 ? 32 : __builtin_clz(x)" or similar).
>>
>> It is a major feature of Clang that we try not to lock people in by
>> providing incompatible extensions.
>>
>
> Curiously, it looks like GCC is depending on definedness for 0 in its own
> headers:
> https://github.com/gcc-mirror/gcc/blob/master/gcc/config/i386/lzcntintrin.h
>

I can't find any documentation *at all* for lzcntintrin.h, so I can find no
suggestion that its __lzcnt* functions would have defined behavior on an
input of 0. And apparently MSVC doesn't actually guarantee to give the
width on a zero input for these (if you have BSR but not LZCNT it'll use
BSR instead and give you an unspecified result), so we cannot deduce that
either lzcntintrin.h nor Intrin.h should give a defined result on zero
input to these functions, unless someone has other evidence hidden away
somewhere.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20141027/f7801503/attachment.html>


More information about the cfe-commits mailing list