<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Oct 24, 2014 at 8:23 AM, Robinson, Paul <span dir="ltr"><<a href="mailto:Paul_Robinson@playstation.sony.com" target="_blank">Paul_Robinson@playstation.sony.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Andrea,<br>

You are correct about the behavior of the LLVM intrinsic, however Sean is<br>

concerned about the behavior of the Clang builtin, which is imitating the<br>

GNU builtin, which says "If x is 0, the result is undefined.".<br>

<a href="https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html" target="_blank">https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html</a><br>

<br>

What I did was change the definition of the undefined behavior of the Clang<br>

builtin, which is defined to be anything we want, including something sensible.<br>

On reflection, I think it would NOT be appropriate to document this predictable<br>

behavior of the Clang builtins, because people who are writing source-code calls<br>

should follow the documented behavior and treat them as having undefined behavior<br>

for 0 input.</blockquote><div><br></div><div>What is the harm of documenting it? By defining this behavior we are pledging to forever support it. Users should feel free to use this feature, and we should not be afraid to claim we support it. E.g. an addendum "when targeting BMI-capable x86 chips that have the LZCNT instruction, this intrinsic produces the same result in the 0 case as the LZCNT instruction". It will be a nightmare for users if it "seems to work, but from the documentation it looks like the compiler isn't guaranteed to do this".</div><div><br></div><div>-- Sean Silva</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">  However we are free to *implement* the builtin however we like for<br>

0 input, which we have already done for various other architectures, and we are<br>

now doing the same for X86.<br>

<br>

This allows us to optimize<br>

  unsigned foo(unsigned x) { return (x ? __builtin_clz(x) : 32) }<br>

into a single LZCNT instruction on a BMI-capable X86, if we choose.<br>

But anybody who writes<br>

  unsigned foo(unsigned x) { return __builtin_clz(x); }<br>

is living dangerously and we should not imply otherwise.<br>

--paulr<br>

<span class="im HOEnZb"><br>

> -----Original Message-----<br>

> From: Andrea Di Biagio [mailto:<a href="mailto:andrea.dibiagio@gmail.com">andrea.dibiagio@gmail.com</a>]<br>

> Sent: Friday, October 24, 2014 5:15 AM<br>

> To: Sean Silva<br>

> Cc: Robinson, Paul; <a href="mailto:cfe-commits@cs.uiuc.edu">cfe-commits@cs.uiuc.edu</a><br>

> Subject: Re: [PATCH][X86] __builtin_ctz/clz sometimed defined for zero<br>

> input<br>

><br>

</span><div class="HOEnZb"><div class="h5">> Hi Sean,<br>

><br>

> On Fri, Oct 24, 2014 at 3:02 AM, Sean Silva <<a href="mailto:chisophugis@gmail.com">chisophugis@gmail.com</a>> wrote:<br>

> ><br>

> > If I understand correctly, this patch is trying to change the meaning of<br>

> > __builtin_ctz (et al.) under some extremely specific conditions. I don't<br>

> > think that is the right direction since it will cause surprising<br>

> undefined<br>

> > behavior bugs across platforms.<br>

><br>

> Paul's fix would only affect x86 behavior.<br>

> What it does is simply that on some x86 cpus, ctz/clz is defined on<br>

> zero (i.e. we have an instruction for it).<br>

> I don't think it can cause surprising undefined behavior across platforms.<br>

><br>

> > The intrinsic is documented to have<br>

> > undefined behavior in the 0 case (everywhere I looked, including our<br>

> > internal docs); a user that relies on the 0 case has a bug. It would be<br>

> nice<br>

> > to add a UBSan check for this undefined behavior though to help users<br>

> fix<br>

> > their code.<br>

><br>

> The intrinsic is documented to have the following behavior<br>

> (<a href="http://llvm.org/docs/LangRef.html#llvm-cttz-intrinsic" target="_blank">http://llvm.org/docs/LangRef.html#llvm-cttz-intrinsic</a>):<br>

> "If src == 0 then the result is the size in bits of the type of src if<br>

> is_zero_undef == 0 and undef otherwise."<br>

> Where 'is_zero_undef' is the second argument to the intrinsic.<br>

> So, the intrinsic is not documented to have always undefined behavior<br>

> in the zero case.<br>

> Also I don't think that relying on the 0 case is a bug. It is a very<br>

> reasonable assumption on all modern x86 architectures.<br>

><br>

> > It would be better to just ensure that we always generate optimal code<br>

> in<br>

> > the presence of a manual guard for the 0 case. For example, in the<br>

> > middle-end we could fold a manual 0 guard followed by @llvm.ctlz.*(X,<br>

> true)<br>

> > into @llvm.ctlz.*(X, false).<br>

><br>

> I agree that the codegen should be improved in that case.<br>

><br>

> My concern is that your suggested approach would force people to<br>

> always guard calls to __builtin_ctz/__builtin_clz against zero.<br>

> From a customer point of view, the compiler knows exactly if ctz and<br>

> clz is defined on zero. It is basically pushing the problem on the<br>

> customer by forcing them to guard all the calls to ctz/clz against<br>

> zero. We've already had a number of customer queries/complaints about<br>

> this and I personally don't think it is unreasonable to have ctz/clz<br>

> defined on zero on our target (and other x86 targets where the<br>

> behavior on zero is clearly defined).<br>

><br>

> -Andrea<br>

><br>

> ><br>

> > -- Sean Silva<br>

> ><br>

> > On Thu, Oct 23, 2014 at 4:40 PM, Robinson, Paul<br>

> > <<a href="mailto:Paul_Robinson@playstation.sony.com">Paul_Robinson@playstation.sony.com</a>> wrote:<br>

> >><br>

> >> In general, count-zeros instructions are undefined for a zero input<br>

> value.<br>

> >> However the X86 TZCNT and LZCNT instructions do return the bit-width on<br>

> a<br>

> >> zero input, so make Clang tell LLVM so.<br>

> >> One quirk is that these instructions aren't necessarily both defined,<br>

> so<br>

> >> also create a separate predicate so we can do the right thing for all<br>

> >> CPUs.<br>

> >> --paulr<br>

> >><br>

> >><br>

> >> _______________________________________________<br>

> >> cfe-commits mailing list<br>

> >> <a href="mailto:cfe-commits@cs.uiuc.edu">cfe-commits@cs.uiuc.edu</a><br>

> >> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits</a><br>

> >><br>

> ><br>

> ><br>

> > _______________________________________________<br>

> > cfe-commits mailing list<br>

> > <a href="mailto:cfe-commits@cs.uiuc.edu">cfe-commits@cs.uiuc.edu</a><br>

> > <a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits</a><br>

> ><br>

</div></div></blockquote></div><br></div></div>