[PATCH][X86] __builtin_ctz/clz sometimed defined for zero input
David Majnemer
david.majnemer at gmail.com
Sun Oct 26 21:52:46 PDT 2014
I think Paul's patch is fine.
Having `__builtin_ctz` asking the target if `isCLZForZeroUndef` seems
busted and this patch seems to fix that just fine.
Paul, I think having `isCTZForZeroUndef` delegate to `isCLZForZeroUndef` is
a bit awkward. Instead, could you please duplicate the the function in the
targets? All of them, except X86, have trivial logic for their
isCLZForZeroUndef implementation.
It would LGTM with that change.
On Sun, Oct 26, 2014 at 5:56 PM, Robinson, Paul <
Paul_Robinson at playstation.sony.com> wrote:
> I don't know about the "public" contract, but Clang's own lzcntintrin.h
> uses __builtin_clz without a runtime zero check, protected by the __LZCNT__
> preprocessor define. So Clang's own headers are assuming that
> __builtin_clz will be treated as defined for zero input on these machines.
> That makes my patch a correctness fix, not 'just' an optimization.
>
>
>
> If you use Clang's intrinsics (either the ones from Intrin.h or the ones
> from lzcntintrin.h) you shouldn't need to add your own zero checks; but
> that's the public contract for the intrinsics, not the builtins.
>
>
>
> If we leave __builtin_clz defined as-is, you should always add your own
> zero check; on appropriate machines sometimes Clang will be able to tell
> and optimize it away. But that's target-dependent and I don't know that it
> would be all that smart to codify that target-dependent behavior as part of
> the public contract.
>
> --paulr
>
>
>
> *From:* Sean Silva [mailto:chisophugis at gmail.com]
> *Sent:* Friday, October 24, 2014 5:46 PM
> *To:* Arthur O'Dwyer
> *Cc:* Robinson, Paul; cfe-commits at cs.uiuc.edu
> *Subject:* Re: [PATCH][X86] __builtin_ctz/clz sometimed defined for zero
> input
>
>
>
>
>
>
>
> On Fri, Oct 24, 2014 at 4:53 PM, Arthur O'Dwyer <arthur.j.odwyer at gmail.com>
> wrote:
>
> On Fri, Oct 24, 2014 at 3:55 PM, Sean Silva <chisophugis at gmail.com> wrote:
> >
> > What is the harm of documenting it? By defining this behavior we are
> > pledging to forever support it.
>
> Exactly. That's harm enough, isn't it?
>
>
>
> It looks like clang already has precedent for doing this, and a
> corresponding infrastructure for doing so, so it shouldn't be too
> problematic.
>
>
>
>
>
>
>
> >> This allows us to optimize
> >> unsigned foo(unsigned x) { return (x ? __builtin_clz(x) : 32) }
> >> into a single LZCNT instruction on a BMI-capable X86, if we choose.
>
> I'm no expert, but isn't this kind of peephole optimization supposed
> to be handled by the LLVM (backend) side of things, not the Clang
> side?
>
>
>
> It's not an optimization that is being talked about here. It's a change of
> semantic meaning to define a previously undefined situation. Presumably
> it's a win for the user that thinks that __builtin_clz means "the best
> clz-like instruction available for the target", rather than the
> independently defined meaning given in the docs.
>
>
>
> -- Sean Silva
>
>
>
>
>
>
> The behavior of __builtin_clz(0) is undefined, so no portable program
> can use it. Sure, it might happen to work on your machine, but then
> you switch to a different architecture, or even a slightly older x86,
> or a year from now Intel introduces an *even faster* bit-count
> instruction that yields the wrong value for 0... and suddenly your
> program breaks, and you have to waste time tracking down the bug.
>
> Therefore, portable programs WILL use
>
> unsigned foo(unsigned x) { return (x ? __builtin_clz(x) : 32) }
>
> Therefore, it would be awesome if the LLVM backend for these
> particular Intel x86 machines would generate the most-optimized LZCNT
> instruction sequence for "foo". This seems relatively easy and
> shouldn't involve any changes to the Clang front-end, and *certainly*
> not to the documentation.
>
> Then you get the best of both worlds — old portable code continues to
> work (but gets faster), and newly written code continues to be
> portable.
>
> Andrea wrote:
> >> > My concern is that your suggested approach would force people to
> >> > always guard calls to __builtin_ctz/__builtin_clz against zero.
> >> > From a customer point of view, the compiler knows exactly if ctz and
> >> > clz is defined on zero. It is basically pushing the problem on the
> >> > customer by forcing them to guard all the calls to ctz/clz against
> >> > zero. We've already had a number of customer queries/complaints about
> >> > this and I personally don't think it is unreasonable to have ctz/clz
> >> > defined on zero on our target (and other x86 targets where the
> >> > behavior on zero is clearly defined).
>
> Sounds like a good argument in favor of introducing a new builtin,
> spelled something like "__builtin_lzcnt", that does what the customer
> wants. It could be implemented internally as (x ? __builtin_clz(x) :
> 32), and lowered to a single instruction if possible by the LLVM
> backend for your target.
>
> my $.02,
> –Arthur
>
>
>
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20141026/29ed1c8e/attachment.html>
More information about the cfe-commits
mailing list