[PATCH] D26335: [ms] Reinstate https://reviews.llvm.org/D14748 after https://reviews.llvm.org/D20291

Andrea Di Biagio via cfe-commits cfe-commits at lists.llvm.org
Wed Nov 9 03:27:32 PST 2016


andreadb added inline comments.


================
Comment at: lib/Headers/x86intrin.h:49
+static __inline__ unsigned int __attribute__((__always_inline__, __nodebug__))
+__tzcnt_u32(unsigned int __X) { return __X ? __builtin_ctz(__X) : 32; }
+#ifdef __x86_64__
----------------
hans wrote:
> probinson wrote:
> > hans wrote:
> > > I'm worried about the conditional here. IIRC, ffmpeg uses TZCNT just as a faster encoding for BSF, but now we're putting a conditional in the way, so this will be slower actually. On the other hand, the alternative is weird too :-/
> > I thought there was a peephole to notice a guard like this and do the right thing? In which case having the guard is fine.
> But for non-BMI targets it won't be able to remove the guard (unless it can prove the argument is zero).
> 
> MSVC will just emit the raw 'tzcnt' instruction here, and that's what ffmpeg wants.
I understand your point. However, the semantic of tzcnt_u32 (at least for BMI) is well defined on zero input. Changing that semantic for non-BMI targets sounds a bit odd/confusing to me.

I guess ffmpeg is reliant on the fact that other compilers would either preserve the intrinsic call until instruction selection, or fold it to a meaningful value (i.e. 32 in this case) when the input is known to be zero. If we remove the zero check from tzcnt_u32 (for non BMI targets), we let the optimizer enable rules to aggressively fold calls to tzcnt_u32 to undef when the input is zero.

As a side note: I noticed that gcc provides intrinsic `__bsfd` in ia32intrin.h. Maybe we should do something similar?


https://reviews.llvm.org/D26335





More information about the cfe-commits mailing list