[PATCH][X86] __builtin_ctz/clz sometimed defined for zero input

Fri Oct 24 17:46:06 PDT 2014

On Fri, Oct 24, 2014 at 4:53 PM, Arthur O'Dwyer <arthur.j.odwyer at gmail.com>
wrote:

> On Fri, Oct 24, 2014 at 3:55 PM, Sean Silva <chisophugis at gmail.com> wrote:
> >
> > What is the harm of documenting it? By defining this behavior we are
> > pledging to forever support it.
>
> Exactly. That's harm enough, isn't it?
>

It looks like clang already has precedent for doing this, and a
corresponding infrastructure for doing so, so it shouldn't be too
problematic.

>
> >> This allows us to optimize
> >>   unsigned foo(unsigned x) { return (x ? __builtin_clz(x) : 32) }
> >> into a single LZCNT instruction on a BMI-capable X86, if we choose.
>
> I'm no expert, but isn't this kind of peephole optimization supposed
> to be handled by the LLVM (backend) side of things, not the Clang
> side?
>

It's not an optimization that is being talked about here. It's a change of
semantic meaning to define a previously undefined situation. Presumably
it's a win for the user that thinks that __builtin_clz means "the best
clz-like instruction available for the target", rather than the
independently defined meaning given in the docs.

-- Sean Silva

>
> The behavior of __builtin_clz(0) is undefined, so no portable program
> can use it. Sure, it might happen to work on your machine, but then
> you switch to a different architecture, or even a slightly older x86,
> or a year from now Intel introduces an *even faster* bit-count
> instruction that yields the wrong value for 0... and suddenly your
> program breaks, and you have to waste time tracking down the bug.
>
> Therefore, portable programs WILL use
>
>    unsigned foo(unsigned x) { return (x ? __builtin_clz(x) : 32) }
>
> Therefore, it would be awesome if the LLVM backend for these
> particular Intel x86 machines would generate the most-optimized LZCNT
> instruction sequence for "foo". This seems relatively easy and
> shouldn't involve any changes to the Clang front-end, and *certainly*
> not to the documentation.
>
> Then you get the best of both worlds — old portable code continues to
> work (but gets faster), and newly written code continues to be
> portable.
>
> Andrea wrote:
> >> > My concern is that your suggested approach would force people to
> >> > always guard calls to __builtin_ctz/__builtin_clz against zero.
> >> > From a customer point of view, the compiler knows exactly if ctz and
> >> > clz is defined on zero. It is basically pushing the problem on the
> >> > customer by forcing them to guard all the calls to ctz/clz against
> >> > zero. We've already had a number of customer queries/complaints about
> >> > this and I personally don't think it is unreasonable to have ctz/clz
> >> > defined on zero on our target (and other x86 targets where the
> >> > behavior on zero is clearly defined).
>
> Sounds like a good argument in favor of introducing a new builtin,
> spelled something like "__builtin_lzcnt", that does what the customer
> wants. It could be implemented internally as (x ? __builtin_clz(x) :
> 32), and lowered to a single instruction if possible by the LLVM
> backend for your target.
>
> my $.02,
> –Arthur
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20141024/6549d5f6/attachment.html>