[PATCH][X86] __builtin_ctz/clz sometimed defined for zero input

Fri Oct 24 16:53:53 PDT 2014

On Fri, Oct 24, 2014 at 3:55 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
> What is the harm of documenting it? By defining this behavior we are
> pledging to forever support it.

Exactly. That's harm enough, isn't it?

>> This allows us to optimize
>>   unsigned foo(unsigned x) { return (x ? __builtin_clz(x) : 32) }
>> into a single LZCNT instruction on a BMI-capable X86, if we choose.

I'm no expert, but isn't this kind of peephole optimization supposed
to be handled by the LLVM (backend) side of things, not the Clang
side?

The behavior of __builtin_clz(0) is undefined, so no portable program
can use it. Sure, it might happen to work on your machine, but then
you switch to a different architecture, or even a slightly older x86,
or a year from now Intel introduces an *even faster* bit-count
instruction that yields the wrong value for 0... and suddenly your
program breaks, and you have to waste time tracking down the bug.

Therefore, portable programs WILL use

   unsigned foo(unsigned x) { return (x ? __builtin_clz(x) : 32) }

Therefore, it would be awesome if the LLVM backend for these
particular Intel x86 machines would generate the most-optimized LZCNT
instruction sequence for "foo". This seems relatively easy and
shouldn't involve any changes to the Clang front-end, and *certainly*
not to the documentation.

Then you get the best of both worlds — old portable code continues to
work (but gets faster), and newly written code continues to be
portable.

Andrea wrote:
>> > My concern is that your suggested approach would force people to
>> > always guard calls to __builtin_ctz/__builtin_clz against zero.
>> > From a customer point of view, the compiler knows exactly if ctz and
>> > clz is defined on zero. It is basically pushing the problem on the
>> > customer by forcing them to guard all the calls to ctz/clz against
>> > zero. We've already had a number of customer queries/complaints about
>> > this and I personally don't think it is unreasonable to have ctz/clz
>> > defined on zero on our target (and other x86 targets where the
>> > behavior on zero is clearly defined).

Sounds like a good argument in favor of introducing a new builtin,
spelled something like "__builtin_lzcnt", that does what the customer
wants. It could be implemented internally as (x ? __builtin_clz(x) :
32), and lowered to a single instruction if possible by the LLVM
backend for your target.

my $.02,
–Arthur