[PATCH][X86] __builtin_ctz/clz sometimed defined for zero input

Mon Oct 27 15:22:16 PDT 2014

On Sun, Oct 26, 2014 at 5:56 PM, Robinson, Paul <
Paul_Robinson at playstation.sony.com> wrote:

>  I don't know about the "public" contract, but Clang's own lzcntintrin.h
> uses __builtin_clz without a runtime zero check, protected by the __LZCNT__
> preprocessor define.  So Clang's own headers are assuming that
> __builtin_clz will be treated as defined for zero input on these machines.
> That makes my patch a correctness fix, not 'just' an optimization.
>

I think that the header is using __LZCNT__ more along the lines of "you
can't use this header if you don't have __LZCNT__", rather than "these
builtins won't work right without __LZCNT__". I would describe the
situation as "__lzcnt* intrinsics incorrectly implemented". Now that I
think about the situation in our internal bugzilla, our replies on the bugs
regarding behavior of __builtin_clz(0) should have just been "if you want
the LZCNT instruction, then use <lzcntintrin.h>" (and then the next bug
would have been "__lzcnt* are incorrectly implemented", which is currently
the case).

-- Sean Silva

>
>
> If you use Clang's intrinsics (either the ones from Intrin.h or the ones
> from lzcntintrin.h) you shouldn't need to add your own zero checks; but
> that's the public contract for the intrinsics, not the builtins.
>
>
>
> If we leave __builtin_clz defined as-is, you should always add your own
> zero check; on appropriate machines sometimes Clang will be able to tell
> and optimize it away.  But that's target-dependent and I don't know that it
> would be all that smart to codify that target-dependent behavior as part of
> the public contract.
>
> --paulr
>
>
>
> *From:* Sean Silva [mailto:chisophugis at gmail.com]
> *Sent:* Friday, October 24, 2014 5:46 PM
> *To:* Arthur O'Dwyer
> *Cc:* Robinson, Paul; cfe-commits at cs.uiuc.edu
> *Subject:* Re: [PATCH][X86] __builtin_ctz/clz sometimed defined for zero
> input
>
>
>
>
>
>
>
> On Fri, Oct 24, 2014 at 4:53 PM, Arthur O'Dwyer <arthur.j.odwyer at gmail.com>
> wrote:
>
> On Fri, Oct 24, 2014 at 3:55 PM, Sean Silva <chisophugis at gmail.com> wrote:
> >
> > What is the harm of documenting it? By defining this behavior we are
> > pledging to forever support it.
>
> Exactly. That's harm enough, isn't it?
>
>
>
> It looks like clang already has precedent for doing this, and a
> corresponding infrastructure for doing so, so it shouldn't be too
> problematic.
>
>
>
>
>
>
>
> >> This allows us to optimize
> >>   unsigned foo(unsigned x) { return (x ? __builtin_clz(x) : 32) }
> >> into a single LZCNT instruction on a BMI-capable X86, if we choose.
>
> I'm no expert, but isn't this kind of peephole optimization supposed
> to be handled by the LLVM (backend) side of things, not the Clang
> side?
>
>
>
> It's not an optimization that is being talked about here. It's a change of
> semantic meaning to define a previously undefined situation. Presumably
> it's a win for the user that thinks that __builtin_clz means "the best
> clz-like instruction available for the target", rather than the
> independently defined meaning given in the docs.
>
>
>
> -- Sean Silva
>
>
>
>
>
>
> The behavior of __builtin_clz(0) is undefined, so no portable program
> can use it. Sure, it might happen to work on your machine, but then
> you switch to a different architecture, or even a slightly older x86,
> or a year from now Intel introduces an *even faster* bit-count
> instruction that yields the wrong value for 0... and suddenly your
> program breaks, and you have to waste time tracking down the bug.
>
> Therefore, portable programs WILL use
>
>    unsigned foo(unsigned x) { return (x ? __builtin_clz(x) : 32) }
>
> Therefore, it would be awesome if the LLVM backend for these
> particular Intel x86 machines would generate the most-optimized LZCNT
> instruction sequence for "foo". This seems relatively easy and
> shouldn't involve any changes to the Clang front-end, and *certainly*
> not to the documentation.
>
> Then you get the best of both worlds — old portable code continues to
> work (but gets faster), and newly written code continues to be
> portable.
>
> Andrea wrote:
> >> > My concern is that your suggested approach would force people to
> >> > always guard calls to __builtin_ctz/__builtin_clz against zero.
> >> > From a customer point of view, the compiler knows exactly if ctz and
> >> > clz is defined on zero. It is basically pushing the problem on the
> >> > customer by forcing them to guard all the calls to ctz/clz against
> >> > zero. We've already had a number of customer queries/complaints about
> >> > this and I personally don't think it is unreasonable to have ctz/clz
> >> > defined on zero on our target (and other x86 targets where the
> >> > behavior on zero is clearly defined).
>
> Sounds like a good argument in favor of introducing a new builtin,
> spelled something like "__builtin_lzcnt", that does what the customer
> wants. It could be implemented internally as (x ? __builtin_clz(x) :
> 32), and lowered to a single instruction if possible by the LLVM
> backend for your target.
>
> my $.02,
> –Arthur
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20141027/961737df/attachment.html>