[cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and compatibility with MSVC's <intrin.h>

Cohen, Elad2 via cfe-dev cfe-dev at lists.llvm.org
Sun Oct 30 02:19:03 PDT 2016


Hi Nathan,

> I've been following the review below and its successor, D25767, but 
> I'm unable to make the proposed solution (or its equivalent) work.  If 
> I specify:
>
> -Xclang -fimplicit-module-maps -Xclang -fmodule-map-file=[builtin 
> .modulemap]
>
> I still get an error about an unresolved __tzcnt_u32 symbol.  If I 
> remove the "-Xclang -fimplicit-module-maps" from the above, I get the 
> same error.  If I try instead:
>
> -Xclang -fmodules -Xclang -fmodule-map-file=[builtin .modulemap]
>
> I get peculiar errors about x86intrin.h and the _Builtin_intrinsics 
> module not being available and implicit use of module files being 
> disabled.  (Same goes for _Builtin_stddef_max_align_t and
> __stddef_max_align_t.h.)  Adding "-Xclang -fimplicit-module-maps"
> gives me the same error.
>
> I'm not sure what's going wrong, or if there's been some sort of 
> mistake in generating the module map.  Help?

The equivalent command line using clang-cl should specify:
-Xclang -fmodules -Xclang -fmodules-cache-path=[modules cache dir] -Xclang -fmodule-map-file=[builtin .modulemap]
Let me know if this still doesn't work.

Thanks, Elad

-----Original Message-----
From: hwennborg at google.com [mailto:hwennborg at google.com] On Behalf Of Hans Wennborg
Sent: Saturday, October 29, 2016 01:56
To: Nathan Froyd <nfroyd at mozilla.com>
Cc: Cohen, Elad2 <elad2.cohen at intel.com>; Demikhovsky, Elena <elena.demikhovsky at intel.com>; Reid Kleckner <rnk at google.com>; Blank, Guy <guy.blank at intel.com>; Aboud, Amjad <amjad.aboud at intel.com>; Craig Topper <craig.topper at gmail.com>; Badouh, Asaf <asaf.badouh at intel.com>; Zuckerman, Michael <michael.zuckerman at intel.com>; Breger, Igor <igor.breger at intel.com>; Nico Weber <thakis at chromium.org>; cfe-dev <cfe-dev at lists.llvm.org>; Ouriel, Boaz <boaz.ouriel at intel.com>; Rackover, Zvi <zvi.rackover at intel.com>
Subject: Re: [cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and compatibility with MSVC's <intrin.h>

I haven't been following this thread very closely since the topic shifted towards using modules for the intrinsics headers.

Is the underlying problem we're trying to solve still how to expose _tzcnt_u32? Isn't it enough to just make it available in intrin.h instead of (or addition to) bmiintrin.h? It's special in that it really doesn't require a BMI-enabled target to be useful, and is probably one of very few instructions with this property?

 - Hans

On Fri, Oct 28, 2016 at 3:46 PM, Nathan Froyd <nfroyd at mozilla.com> wrote:
> I've been following the review below and its successor, D25767, but 
> I'm unable to make the proposed solution (or its equivalent) work.  If 
> I specify:
>
> -Xclang -fimplicit-module-maps -Xclang -fmodule-map-file=[builtin 
> .modulemap]
>
> I still get an error about an unresolved __tzcnt_u32 symbol.  If I 
> remove the "-Xclang -fimplicit-module-maps" from the above, I get the 
> same error.  If I try instead:
>
> -Xclang -fmodules -Xclang -fmodule-map-file=[builtin .modulemap]
>
> I get peculiar errors about x86intrin.h and the _Builtin_intrinsics 
> module not being available and implicit use of module files being 
> disabled.  (Same goes for _Builtin_stddef_max_align_t and
> __stddef_max_align_t.h.)  Adding "-Xclang -fimplicit-module-maps"
> gives me the same error.
>
> I'm not sure what's going wrong, or if there's been some sort of 
> mistake in generating the module map.  Help?
>
> Thanks,
> -Nathan
>
> On Thu, Oct 6, 2016 at 2:45 PM, Cohen, Elad2 <elad2.cohen at intel.com> wrote:
>> I uploaded to review the patch that adds the option for enabling the 
>> modules feature exclusively for the intrinsic header files:
>> https://reviews.llvm.org/D25337
>> You can try it out by adding the –fexclusive-builtin-modules command 
>> line option. (all the different x86 *intrin.h files should be 
>> included by default since the #if guards already contain a __has_feature(modules) condition).
>>
>>
>>
>> Thanks, Elad
>>
>>
>>
>> From: Cohen, Elad2
>> Sent: Friday, September 30, 2016 16:54
>> To: Demikhovsky, Elena <elena.demikhovsky at intel.com>; Reid Kleckner 
>> <rnk at google.com>; Hans Wennborg <hans at chromium.org>; Blank, Guy 
>> <guy.blank at intel.com>; Aboud, Amjad <amjad.aboud at intel.com>; Craig 
>> Topper <craig.topper at gmail.com>; Badouh, Asaf 
>> <asaf.badouh at intel.com>; Zuckerman, Michael 
>> <michael.zuckerman at intel.com>; Breger, Igor <igor.breger at intel.com>
>> Cc: Nathan Froyd <nfroyd at mozilla.com>; Nico Weber 
>> <thakis at chromium.org>; cfe-dev <cfe-dev at lists.llvm.org>; Ouriel, Boaz 
>> <boaz.ouriel at intel.com>; Rackover, Zvi <zvi.rackover at intel.com>
>>
>>
>> Subject: RE: [cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and 
>> compatibility with MSVC's <intrin.h>
>>
>>
>>
>> Hi All,
>>
>>
>>
>> As mentioned in past discussions, the introduction of AVX512 
>> intrinsics had a severe impact on the compile time.
>>
>> The extreme case happens on windows where the mere inclusion of 
>> system headers like <string> causes a hit in the compile time of 
>> applications that don’t even make use of any intrinsics.
>>
>> http://lists.llvm.org/pipermail/cfe-dev/2016-May/048837.html
>> http://lists.llvm.org/pipermail/cfe-dev/2016-June/049460.html
>>
>> During these discussions Chandler Carruth indicated that usage of 
>> clang modules is probably the best alternative to mitigate this problem:
>>
>> http://lists.llvm.org/pipermail/cfe-dev/2016-June/049513.html
>>
>>
>>
>> Based on this feedback, Intel has started working on this proposal. 
>> Initial Internal measurements indicated that this direction is promising.
>>
>> For example, the compile time of a single translation unit that 
>> includes all of the intel intrinsic header files would reduce by 90% 
>> starting from the 2nd compilation (by using cached pre-compiled 
>> modules). In general, we saw that compilation of projects that have 
>> more than one translation unit that includes x86intrin.h/immintrin.h 
>> will get big compile time reductions starting from the 1st compilation.
>>
>>
>>
>> The biggest advantage of this approach is that it scales and would 
>> help also applications that do rely on intrinsics and hence will not 
>> be able to avoid the compile time overhead.
>>
>>
>>
>> I (Intel) would like to add a Clang option that enables the modules 
>> feature exclusively for the Intel intrinsics header files, and set 
>> this option to be turned on by default. This allows clang to cache a 
>> pre-compiled (actually
>> pre-parsed) module of all the headers included by 
>> x86intrin.h/immintrin.h, and any following compilation of 
>> #include<x86intrin.h>/<immintrin.h>
>> directives will skip the parsing phase and save precious compile 
>> time. Since ‘clang modules’ and the regular ‘pre-processor include 
>> mechanism’ are not 100% compatible, I started by cleaning some small 
>> Intel intrinsics module bugs exposed by this work:
>>
>> https://reviews.llvm.org/D23871, https://reviews.llvm.org/D24825,
>> https://reviews.llvm.org/D24752 (This one is still in review).
>>
>> I plan to upload the actual patch that adds the option I mentioned 
>> above somewhere next week hopefully. Once this patch is uploaded, you 
>> could experiment with it and see if it actually mitigates the problem 
>> on your end and provide us with valuable feedback on this direction.
>>
>>
>>
>> Thanks,
>>
>> Elad
>>
>>
>>
>>
>>
>> From: Demikhovsky, Elena
>> Sent: Thursday, September 29, 2016 22:20
>> To: Reid Kleckner <rnk at google.com>; Hans Wennborg 
>> <hans at chromium.org>; Blank, Guy <guy.blank at intel.com>; Aboud, Amjad 
>> <amjad.aboud at intel.com>; Craig Topper <craig.topper at gmail.com>; 
>> Badouh, Asaf <asaf.badouh at intel.com>; Zuckerman, Michael 
>> <michael.zuckerman at intel.com>; Breger, Igor <igor.breger at intel.com>
>> Cc: Nathan Froyd <nfroyd at mozilla.com>; Nico Weber 
>> <thakis at chromium.org>; cfe-dev <cfe-dev at lists.llvm.org>; Ouriel, Boaz 
>> <boaz.ouriel at intel.com>; Cohen, Elad2 <elad2.cohen at intel.com>
>> Subject: RE: [cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and 
>> compatibility with MSVC's <intrin.h>
>>
>>
>>
>> Forwarding your mail to Elad who is working now on intrinsics. Please 
>> take into account a possible delay in Elad’s response, we are in 
>> local holidays this week.
>>
>>
>>
>> -           Elena
>>
>>
>>
>> From: Reid Kleckner [mailto:rnk at google.com]
>> Sent: Thursday, September 29, 2016 22:02
>> To: Hans Wennborg <hans at chromium.org>; Blank, Guy 
>> <guy.blank at intel.com>; Aboud, Amjad <amjad.aboud at intel.com>; 
>> Demikhovsky, Elena <elena.demikhovsky at intel.com>; Craig Topper 
>> <craig.topper at gmail.com>; Badouh, Asaf <asaf.badouh at intel.com>; 
>> Zuckerman, Michael <michael.zuckerman at intel.com>; Breger, Igor 
>> <igor.breger at intel.com>
>> Cc: Nathan Froyd <nfroyd at mozilla.com>; Nico Weber 
>> <thakis at chromium.org>; cfe-dev <cfe-dev at lists.llvm.org>
>> Subject: Re: [cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and 
>> compatibility with MSVC's <intrin.h>
>>
>>
>>
>> We've been here before:
>>
>> https://llvm.org/bugs/show_bug.cgi?id=25544
>>
>> http://llvm.org/viewvc/llvm-project?view=revision&revision=253358
>>
>> https://bugs.chromium.org/p/chromium/issues/detail?id=556735
>>
>>
>>
>> Nico's change to avoid including bmiintrin.h if _MSC_VER is set to 
>> reduce compile time brought the problem back.
>>
>>
>>
>> Basically, Intel's immintrin.h interface is too big. It's windows.h 
>> all over again. It significantly slows down builds of simple projects 
>> that include basic STL headers like <string>. Nico was able to speed 
>> up the *overall* build time of Chromium by at least 10% (ask him for 
>> specifics) by adding all these '#if !defined(_MSC_VER)' checks to 
>> immintrin.h. However, for compatibility, I think we may need intrin.h 
>> to include most of that stuff by default.
>>
>>
>>
>> I added a bunch of Intel folks to basically ask the question: Can we 
>> please go back to the old days of mmintrin.h, xmmintrin.h, then emmintrin.h?
>>
>>
>>
>> If not, we will have to consider adding an evil configuration macro, 
>> like WIN32_LEAN_AND_MEAN for windows.h, that opts out of all this 
>> extra immintrin.h functionality to speed up compilation.
>>
>>
>>
>> On Thu, Sep 29, 2016 at 11:04 AM, Hans Wennborg <hans at chromium.org> wrote:
>>
>> On Thu, Sep 29, 2016 at 10:50 AM, Nathan Froyd via cfe-dev 
>> <cfe-dev at lists.llvm.org> wrote:
>>> [While I filed this as https://llvm.org/bugs/show_bug.cgi?id=30506,
>>> Paul Robinson suggested in the bug that it might be worth bringing 
>>> to cfe-dev for wider discussion.]
>>>
>>> Firefox's copy of FFmpeg includes <intrin.h> on MSVC:
>>>
>>>
>>> https://dxr.mozilla.org/mozilla-central/source/media/ffvpx/libavutil
>>> /x86/intmath.h#26
>>>
>>> and MSVC's <intrin.h> (after including several other headers) 
>>> declares _tzcnt_u32.  clang-cl declares _tzcnt_u32 in <bmiintrin.h>, 
>>> but <bmiintrin.h> is only included from <intrin.h> if an appropriate 
>>> target CPU is detected:
>>>
>>>
>>> https://github.com/llvm-mirror/clang/blob/master/lib/Headers/immintr
>>> in.h#L82
>>>
>>> even though _tzcnt_u32 (or rather, its underlying implementation 
>>> function, __tzcnt_u32) is explicitly declared to be available
>>> everywhere:
>>>
>>>
>>> https://github.com/llvm-mirror/clang/blob/master/lib/Headers/bmiintr
>>> in.h#L284
>>>
>>> The net result is that Firefox's copy of FFmpeg doesn't compile with 
>>> clang-cl because of this issue.  Upstream has the same code, so I 
>>> assume the same is true there:
>>>
>>> https://github.com/FFmpeg/FFmpeg/blob/master/libavutil/x86/intmath.h
>>> #L26
>>>
>>> AFAICT, this behavior was changed by only including <bmiitrin.h> if 
>>> __BMI__ is defined in:
>>>
>>> http://reviews.llvm.org/D20291
>>>
>>> with the desirable goal of reducing compile time.  But this change 
>>> also broke things like _tzcnt_u32 being available.
>>>
>>> What is the right thing to do here?  Should <bmiintrin.h> be 
>>> unconditionally included, or should something else be done?
>>
>> I think these intrinsics (it's really just tzcnt though?) that are 
>> useful also for non-BMI targets should probably be moved out of the 
>> BMI-specific  header.
>>
>> +thakis and rnk if they have any thoughts on this.
>>
>>
>>
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>>
>> This e-mail and any attachments may contain confidential material for 
>> the sole use of the intended recipient(s). Any review or distribution 
>> by others is strictly prohibited. If you are not the intended 
>> recipient, please contact the sender and delete all copies.
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


More information about the cfe-dev mailing list