[PATCH][X86] __builtin_ctz/clz sometimed defined for zero input
Paul_Robinson at playstation.sony.com
Thu Oct 23 23:51:54 PDT 2014
See the existing commentary for TargetInfo::isCLZForZeroUndef. ARM, AArch64, Mips, PPC all have this as defined for zero input, see overrides in Targets.cpp.
This is well-trod ground, and merely corrects a deficiency in the X86 info.
Andrea has a patch coming that does exactly the simplification you're suggesting, but driven by the flag passed in by Clang rather than a-priori knowledge about instruction behavior.
I'm not willing to take on the chore of reassigning responsibility for which component knows exactly which arcana about particular instructions. I could see adding a note to the description of the builtins.
From: Sean Silva [mailto:chisophugis at gmail.com]
Sent: Thursday, October 23, 2014 7:03 PM
To: Robinson, Paul
Cc: cfe-commits at cs.uiuc.edu
Subject: Re: [PATCH][X86] __builtin_ctz/clz sometimed defined for zero input
If I understand correctly, this patch is trying to change the meaning of __builtin_ctz (et al.) under some extremely specific conditions. I don't think that is the right direction since it will cause surprising undefined behavior bugs across platforms. The intrinsic is documented to have undefined behavior in the 0 case (everywhere I looked, including our internal docs); a user that relies on the 0 case has a bug. It would be nice to add a UBSan check for this undefined behavior though to help users fix their code.
It would be better to just ensure that we always generate optimal code in the presence of a manual guard for the 0 case. For example, in the middle-end we could fold a manual 0 guard followed by @llvm.ctlz.*(X, true) into @llvm.ctlz.*(X, false).
-- Sean Silva
On Thu, Oct 23, 2014 at 4:40 PM, Robinson, Paul <Paul_Robinson at playstation.sony.com<mailto:Paul_Robinson at playstation.sony.com>> wrote:
In general, count-zeros instructions are undefined for a zero input value.
However the X86 TZCNT and LZCNT instructions do return the bit-width on a
zero input, so make Clang tell LLVM so.
One quirk is that these instructions aren't necessarily both defined, so
also create a separate predicate so we can do the right thing for all CPUs.
cfe-commits mailing list
cfe-commits at cs.uiuc.edu<mailto:cfe-commits at cs.uiuc.edu>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the cfe-commits