[PATCH] D26335: [ms] Reinstate https://reviews.llvm.org/D14748 after https://reviews.llvm.org/D20291

Hans Wennborg via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Nov 28 09:29:41 PST 2016


hans added inline comments.


================
Comment at: lib/Headers/x86intrin.h:49
+static __inline__ unsigned int __attribute__((__always_inline__, __nodebug__))
+__tzcnt_u32(unsigned int __X) { return __X ? __builtin_ctz(__X) : 32; }
+#ifdef __x86_64__
----------------
andreadb wrote:
> hans wrote:
> > probinson wrote:
> > > hans wrote:
> > > > I'm worried about the conditional here. IIRC, ffmpeg uses TZCNT just as a faster encoding for BSF, but now we're putting a conditional in the way, so this will be slower actually. On the other hand, the alternative is weird too :-/
> > > I thought there was a peephole to notice a guard like this and do the right thing? In which case having the guard is fine.
> > But for non-BMI targets it won't be able to remove the guard (unless it can prove the argument is zero).
> > 
> > MSVC will just emit the raw 'tzcnt' instruction here, and that's what ffmpeg wants.
> I understand your point. However, the semantic of tzcnt_u32 (at least for BMI) is well defined on zero input. Changing that semantic for non-BMI targets sounds a bit odd/confusing to me.
> 
> I guess ffmpeg is reliant on the fact that other compilers would either preserve the intrinsic call until instruction selection, or fold it to a meaningful value (i.e. 32 in this case) when the input is known to be zero. If we remove the zero check from tzcnt_u32 (for non BMI targets), we let the optimizer enable rules to aggressively fold calls to tzcnt_u32 to undef when the input is zero.
> 
> As a side note: I noticed that gcc provides intrinsic `__bsfd` in ia32intrin.h. Maybe we should do something similar?
Yes, ffmpeg relies on the compiler to emit the raw TZCNT instruction even for non-BMI targets and don't call it with zero inputs. They just want a faster BSF.

It's all pretty weird, but maybe Nico's original patch is the way to do this after all. At least then, the intrinsic will always do what it says in the doc.


https://reviews.llvm.org/D26335





More information about the cfe-commits mailing list