<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Oct 24, 2014 at 4:53 PM, Arthur O'Dwyer <span dir="ltr"><<a href="mailto:arthur.j.odwyer@gmail.com" target="_blank">arthur.j.odwyer@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class="">On Fri, Oct 24, 2014 at 3:55 PM, Sean Silva <<a href="mailto:chisophugis@gmail.com">chisophugis@gmail.com</a>> wrote:<br>
><br>
> What is the harm of documenting it? By defining this behavior we are<br>
> pledging to forever support it.<br>
<br>
</span>Exactly. That's harm enough, isn't it?<br></blockquote><div><br></div><div>It looks like clang already has precedent for doing this, and a corresponding infrastructure for doing so, so it shouldn't be too problematic.</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<span class=""><br>
<br>
>> This allows us to optimize<br>
>> unsigned foo(unsigned x) { return (x ? __builtin_clz(x) : 32) }<br>
>> into a single LZCNT instruction on a BMI-capable X86, if we choose.<br>
<br>
</span>I'm no expert, but isn't this kind of peephole optimization supposed<br>
to be handled by the LLVM (backend) side of things, not the Clang<br>
side?<br></blockquote><div><br></div><div>It's not an optimization that is being talked about here. It's a change of semantic meaning to define a previously undefined situation. Presumably it's a win for the user that thinks that __builtin_clz means "the best clz-like instruction available for the target", rather than the independently defined meaning given in the docs.</div><div><br></div><div><div>-- Sean Silva</div><div> </div></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
The behavior of __builtin_clz(0) is undefined, so no portable program<br>
can use it. Sure, it might happen to work on your machine, but then<br>
you switch to a different architecture, or even a slightly older x86,<br>
or a year from now Intel introduces an *even faster* bit-count<br>
instruction that yields the wrong value for 0... and suddenly your<br>
program breaks, and you have to waste time tracking down the bug.<br>
<br>
Therefore, portable programs WILL use<br>
<span class=""><br>
unsigned foo(unsigned x) { return (x ? __builtin_clz(x) : 32) }<br>
<br>
</span>Therefore, it would be awesome if the LLVM backend for these<br>
particular Intel x86 machines would generate the most-optimized LZCNT<br>
instruction sequence for "foo". This seems relatively easy and<br>
shouldn't involve any changes to the Clang front-end, and *certainly*<br>
not to the documentation.<br>
<br>
Then you get the best of both worlds — old portable code continues to<br>
work (but gets faster), and newly written code continues to be<br>
portable.<br>
<span class=""><br>
Andrea wrote:<br>
>> > My concern is that your suggested approach would force people to<br>
>> > always guard calls to __builtin_ctz/__builtin_clz against zero.<br>
>> > From a customer point of view, the compiler knows exactly if ctz and<br>
>> > clz is defined on zero. It is basically pushing the problem on the<br>
>> > customer by forcing them to guard all the calls to ctz/clz against<br>
>> > zero. We've already had a number of customer queries/complaints about<br>
>> > this and I personally don't think it is unreasonable to have ctz/clz<br>
>> > defined on zero on our target (and other x86 targets where the<br>
>> > behavior on zero is clearly defined).<br>
<br>
</span>Sounds like a good argument in favor of introducing a new builtin,<br>
spelled something like "__builtin_lzcnt", that does what the customer<br>
wants. It could be implemented internally as (x ? __builtin_clz(x) :<br>
32), and lowered to a single instruction if possible by the LLVM<br>
backend for your target.<br>
<br>
my $.02,<br>
–Arthur<br>
</blockquote></div><br></div></div>