[cfe-commits] r149086 - in /cfe/trunk: include/clang/Basic/TargetInfo.h lib/Basic/Targets.cpp lib/CodeGen/CGBuiltin.cpp test/CodeGen/builtin-count-zeros.c

Benjamin Kramer benny.kra at googlemail.com
Sat Jan 28 10:24:06 PST 2012

On 27.01.2012, at 04:16, Chris Lattner wrote:

> On Jan 26, 2012, at 4:47 PM, Chandler Carruth wrote:
>> BTW, I agree about grandfathering in existing ARM code that relies on this, but still potentially exposing safe and sane entry points for future code. Maybe the Intel intrinsics provide this interface? I don't recall.
>> On Thu, Jan 26, 2012 at 3:02 PM, Bob Wilson <bob.wilson at apple.com> wrote:
>> How is it a divergence?  GCC says that the values are undefined.  This does not conflict with that.  It makes no difference on ARM codegen so the only real effect is to disable optimizations that try to take advantage of that undef.
>> It is a divergence in that Clang is providing stricter guarantees than GCC. When folks complain "Hey, I developed my code w/ Clang, and everything worked great, but when my customer built it with GCC it miscompiled and exploded. WTF?" I want to have documentation of the Clang language extension that their code depends on.
> FWIW, I agree with Bob.  Call this "implementation defined" behavior.

This opens up the question if __builtin_c[tl]z should be undef-on-zero when the target cpu supports x86's lzcnt/tzcnt instructions. GCC's __lzcnt intrinsics (abmintrin.h) compile down to __builtin_clz, but I don't know if this is on purpose or an accident since GCC doesn't optimize on __builtin_clz's behavior aggressively. Clang's bmiitrin.h intrinsics do the same, which means they are completely broken at the moment if someone depends on them being defined for zero.

I don't know what's the right thing to do here, but the behavior of those intrinsics should be fixed.

- Ben

> As a random request, Bob, can you also do this for PowerPC?  I note that LLVM's own MathExtras.h header has:
> inline unsigned CountLeadingZeros_32(uint32_t Value) {
>   unsigned Count; 
>   // PowerPC is defined for __builtin_clz(0)
> #if !defined(__ppc__) && !defined(__ppc64__)
>   if (!Value) return 32;
> #endif
>   Count = __builtin_clz(Value);
>   return Count;
> }
> How about that :)
>> As an unrelated (and orthogonal to your change) note, it would be good to make sure that LLVM folds code like:
>> int f(int x) { return x ? __builtin_clz(x) : sizeof(int)*8; }
>> on ARM to not have the dead condition. I tried to do this for a few cases, but I'm worried I might not have gotten it for all of them... Dunno when I'll get back to further hacking on this style of optimization.
> Yes, we "should" fold that on any target for IR that is using "defined at zero" semantics.  In fact, we should probably fold that (when using "undefined at zero" semantics) to set the "defined at zero" bit and have the code generator expand out the right code.
> -Chris
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

More information about the cfe-commits mailing list