[cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and compatibility with MSVC's <intrin.h>

Nathan Froyd via cfe-dev cfe-dev at lists.llvm.org
Mon Oct 31 07:29:07 PDT 2016


Exposing _tzcnt_u32 in intrin.h seems like the easiest solution to me,
yes.  Modules may also solve this problem elegantly, but requiring
people to modify command-lines just for MSVC compat in this instance
seems like a unnecessary hurdle.

-Nathan

On Fri, Oct 28, 2016 at 6:55 PM, Hans Wennborg <hans at chromium.org> wrote:
> I haven't been following this thread very closely since the topic
> shifted towards using modules for the intrinsics headers.
>
> Is the underlying problem we're trying to solve still how to expose
> _tzcnt_u32? Isn't it enough to just make it available in intrin.h
> instead of (or addition to) bmiintrin.h? It's special in that it
> really doesn't require a BMI-enabled target to be useful, and is
> probably one of very few instructions with this property?
>
>  - Hans
>
> On Fri, Oct 28, 2016 at 3:46 PM, Nathan Froyd <nfroyd at mozilla.com> wrote:
>> I've been following the review below and its successor, D25767, but
>> I'm unable to make the proposed solution (or its equivalent) work.  If
>> I specify:
>>
>> -Xclang -fimplicit-module-maps -Xclang -fmodule-map-file=[builtin .modulemap]
>>
>> I still get an error about an unresolved __tzcnt_u32 symbol.  If I
>> remove the "-Xclang -fimplicit-module-maps" from the above, I get the
>> same error.  If I try instead:
>>
>> -Xclang -fmodules -Xclang -fmodule-map-file=[builtin .modulemap]
>>
>> I get peculiar errors about x86intrin.h and the _Builtin_intrinsics
>> module not being available and implicit use of module files being
>> disabled.  (Same goes for _Builtin_stddef_max_align_t and
>> __stddef_max_align_t.h.)  Adding "-Xclang -fimplicit-module-maps"
>> gives me the same error.
>>
>> I'm not sure what's going wrong, or if there's been some sort of
>> mistake in generating the module map.  Help?
>>
>> Thanks,
>> -Nathan
>>
>> On Thu, Oct 6, 2016 at 2:45 PM, Cohen, Elad2 <elad2.cohen at intel.com> wrote:
>>> I uploaded to review the patch that adds the option for enabling the modules
>>> feature exclusively for the intrinsic header files:
>>> https://reviews.llvm.org/D25337
>>> You can try it out by adding the –fexclusive-builtin-modules command line
>>> option. (all the different x86 *intrin.h files should be included by default
>>> since the #if guards already contain a __has_feature(modules) condition).
>>>
>>>
>>>
>>> Thanks, Elad
>>>
>>>
>>>
>>> From: Cohen, Elad2
>>> Sent: Friday, September 30, 2016 16:54
>>> To: Demikhovsky, Elena <elena.demikhovsky at intel.com>; Reid Kleckner
>>> <rnk at google.com>; Hans Wennborg <hans at chromium.org>; Blank, Guy
>>> <guy.blank at intel.com>; Aboud, Amjad <amjad.aboud at intel.com>; Craig Topper
>>> <craig.topper at gmail.com>; Badouh, Asaf <asaf.badouh at intel.com>; Zuckerman,
>>> Michael <michael.zuckerman at intel.com>; Breger, Igor <igor.breger at intel.com>
>>> Cc: Nathan Froyd <nfroyd at mozilla.com>; Nico Weber <thakis at chromium.org>;
>>> cfe-dev <cfe-dev at lists.llvm.org>; Ouriel, Boaz <boaz.ouriel at intel.com>;
>>> Rackover, Zvi <zvi.rackover at intel.com>
>>>
>>>
>>> Subject: RE: [cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and compatibility
>>> with MSVC's <intrin.h>
>>>
>>>
>>>
>>> Hi All,
>>>
>>>
>>>
>>> As mentioned in past discussions, the introduction of AVX512 intrinsics had
>>> a severe impact on the compile time.
>>>
>>> The extreme case happens on windows where the mere inclusion of system
>>> headers like <string> causes a hit in the compile time of applications that
>>> don’t even make use of any intrinsics.
>>>
>>> http://lists.llvm.org/pipermail/cfe-dev/2016-May/048837.html
>>> http://lists.llvm.org/pipermail/cfe-dev/2016-June/049460.html
>>>
>>> During these discussions Chandler Carruth indicated that usage of clang
>>> modules is probably the best alternative to mitigate this problem:
>>>
>>> http://lists.llvm.org/pipermail/cfe-dev/2016-June/049513.html
>>>
>>>
>>>
>>> Based on this feedback, Intel has started working on this proposal. Initial
>>> Internal measurements indicated that this direction is promising.
>>>
>>> For example, the compile time of a single translation unit that includes all
>>> of the intel intrinsic header files would reduce by 90% starting from the
>>> 2nd compilation (by using cached pre-compiled modules). In general, we saw
>>> that compilation of projects that have more than one translation unit that
>>> includes x86intrin.h/immintrin.h will get big compile time reductions
>>> starting from the 1st compilation.
>>>
>>>
>>>
>>> The biggest advantage of this approach is that it scales and would help also
>>> applications that do rely on intrinsics and hence will not be able to avoid
>>> the compile time overhead.
>>>
>>>
>>>
>>> I (Intel) would like to add a Clang option that enables the modules feature
>>> exclusively for the Intel intrinsics header files, and set this option to be
>>> turned on by default. This allows clang to cache a pre-compiled (actually
>>> pre-parsed) module of all the headers included by x86intrin.h/immintrin.h,
>>> and any following compilation of #include<x86intrin.h>/<immintrin.h>
>>> directives will skip the parsing phase and save precious compile time. Since
>>> ‘clang modules’ and the regular ‘pre-processor include mechanism’ are not
>>> 100% compatible, I started by cleaning some small Intel intrinsics module
>>> bugs exposed by this work:
>>>
>>> https://reviews.llvm.org/D23871, https://reviews.llvm.org/D24825,
>>> https://reviews.llvm.org/D24752 (This one is still in review).
>>>
>>> I plan to upload the actual patch that adds the option I mentioned above
>>> somewhere next week hopefully. Once this patch is uploaded, you could
>>> experiment with it and see if it actually mitigates the problem on your end
>>> and provide us with valuable feedback on this direction.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Elad
>>>
>>>
>>>
>>>
>>>
>>> From: Demikhovsky, Elena
>>> Sent: Thursday, September 29, 2016 22:20
>>> To: Reid Kleckner <rnk at google.com>; Hans Wennborg <hans at chromium.org>;
>>> Blank, Guy <guy.blank at intel.com>; Aboud, Amjad <amjad.aboud at intel.com>;
>>> Craig Topper <craig.topper at gmail.com>; Badouh, Asaf <asaf.badouh at intel.com>;
>>> Zuckerman, Michael <michael.zuckerman at intel.com>; Breger, Igor
>>> <igor.breger at intel.com>
>>> Cc: Nathan Froyd <nfroyd at mozilla.com>; Nico Weber <thakis at chromium.org>;
>>> cfe-dev <cfe-dev at lists.llvm.org>; Ouriel, Boaz <boaz.ouriel at intel.com>;
>>> Cohen, Elad2 <elad2.cohen at intel.com>
>>> Subject: RE: [cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and compatibility
>>> with MSVC's <intrin.h>
>>>
>>>
>>>
>>> Forwarding your mail to Elad who is working now on intrinsics. Please take
>>> into account a possible delay in Elad’s response, we are in local holidays
>>> this week.
>>>
>>>
>>>
>>> -           Elena
>>>
>>>
>>>
>>> From: Reid Kleckner [mailto:rnk at google.com]
>>> Sent: Thursday, September 29, 2016 22:02
>>> To: Hans Wennborg <hans at chromium.org>; Blank, Guy <guy.blank at intel.com>;
>>> Aboud, Amjad <amjad.aboud at intel.com>; Demikhovsky, Elena
>>> <elena.demikhovsky at intel.com>; Craig Topper <craig.topper at gmail.com>;
>>> Badouh, Asaf <asaf.badouh at intel.com>; Zuckerman, Michael
>>> <michael.zuckerman at intel.com>; Breger, Igor <igor.breger at intel.com>
>>> Cc: Nathan Froyd <nfroyd at mozilla.com>; Nico Weber <thakis at chromium.org>;
>>> cfe-dev <cfe-dev at lists.llvm.org>
>>> Subject: Re: [cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and compatibility
>>> with MSVC's <intrin.h>
>>>
>>>
>>>
>>> We've been here before:
>>>
>>> https://llvm.org/bugs/show_bug.cgi?id=25544
>>>
>>> http://llvm.org/viewvc/llvm-project?view=revision&revision=253358
>>>
>>> https://bugs.chromium.org/p/chromium/issues/detail?id=556735
>>>
>>>
>>>
>>> Nico's change to avoid including bmiintrin.h if _MSC_VER is set to reduce
>>> compile time brought the problem back.
>>>
>>>
>>>
>>> Basically, Intel's immintrin.h interface is too big. It's windows.h all over
>>> again. It significantly slows down builds of simple projects that include
>>> basic STL headers like <string>. Nico was able to speed up the *overall*
>>> build time of Chromium by at least 10% (ask him for specifics) by adding all
>>> these '#if !defined(_MSC_VER)' checks to immintrin.h. However, for
>>> compatibility, I think we may need intrin.h to include most of that stuff by
>>> default.
>>>
>>>
>>>
>>> I added a bunch of Intel folks to basically ask the question: Can we please
>>> go back to the old days of mmintrin.h, xmmintrin.h, then emmintrin.h?
>>>
>>>
>>>
>>> If not, we will have to consider adding an evil configuration macro, like
>>> WIN32_LEAN_AND_MEAN for windows.h, that opts out of all this extra
>>> immintrin.h functionality to speed up compilation.
>>>
>>>
>>>
>>> On Thu, Sep 29, 2016 at 11:04 AM, Hans Wennborg <hans at chromium.org> wrote:
>>>
>>> On Thu, Sep 29, 2016 at 10:50 AM, Nathan Froyd via cfe-dev
>>> <cfe-dev at lists.llvm.org> wrote:
>>>> [While I filed this as https://llvm.org/bugs/show_bug.cgi?id=30506,
>>>> Paul Robinson suggested in the bug that it might be worth bringing to
>>>> cfe-dev for wider discussion.]
>>>>
>>>> Firefox's copy of FFmpeg includes <intrin.h> on MSVC:
>>>>
>>>>
>>>> https://dxr.mozilla.org/mozilla-central/source/media/ffvpx/libavutil/x86/intmath.h#26
>>>>
>>>> and MSVC's <intrin.h> (after including several other headers) declares
>>>> _tzcnt_u32.  clang-cl declares _tzcnt_u32 in <bmiintrin.h>, but
>>>> <bmiintrin.h> is only included from <intrin.h> if an appropriate
>>>> target CPU is detected:
>>>>
>>>>
>>>> https://github.com/llvm-mirror/clang/blob/master/lib/Headers/immintrin.h#L82
>>>>
>>>> even though _tzcnt_u32 (or rather, its underlying implementation
>>>> function, __tzcnt_u32) is explicitly declared to be available
>>>> everywhere:
>>>>
>>>>
>>>> https://github.com/llvm-mirror/clang/blob/master/lib/Headers/bmiintrin.h#L284
>>>>
>>>> The net result is that Firefox's copy of FFmpeg doesn't compile with
>>>> clang-cl because of this issue.  Upstream has the same code, so I
>>>> assume the same is true there:
>>>>
>>>> https://github.com/FFmpeg/FFmpeg/blob/master/libavutil/x86/intmath.h#L26
>>>>
>>>> AFAICT, this behavior was changed by only including <bmiitrin.h> if
>>>> __BMI__ is defined in:
>>>>
>>>> http://reviews.llvm.org/D20291
>>>>
>>>> with the desirable goal of reducing compile time.  But this change
>>>> also broke things like _tzcnt_u32 being available.
>>>>
>>>> What is the right thing to do here?  Should <bmiintrin.h> be
>>>> unconditionally included, or should something else be done?
>>>
>>> I think these intrinsics (it's really just tzcnt though?) that are
>>> useful also for non-BMI targets should probably be moved out of the
>>> BMI-specific  header.
>>>
>>> +thakis and rnk if they have any thoughts on this.
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> Intel Israel (74) Limited
>>>
>>> This e-mail and any attachments may contain confidential material for
>>> the sole use of the intended recipient(s). Any review or distribution
>>> by others is strictly prohibited. If you are not the intended
>>> recipient, please contact the sender and delete all copies.



More information about the cfe-dev mailing list