[cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and compatibility with MSVC's <intrin.h>
Cohen, Elad2 via cfe-dev
cfe-dev at lists.llvm.org
Mon Oct 31 08:02:51 PDT 2016
>This does indeed work, thanks for the pointer. But why? The documentation for -fmodules-cache-path indicates that "Clang will select a system-appropriate default" if this option is not provided.
>So explicitly providing it shouldn't make any difference, right? Or did the system-appropriate default perhaps have stale information in it?
If I'm not mistaken, the modules flags are not exposed to clang-cl driver and only available on the regular clang driver.
When you use clang (and not clang-cl), passing "-fmodules" to the driver will indeed get clang to select a system-appropriate default if "- fmodules-cache-path " is not provided. (You can try "clang -### -fmodules input.c" and see the commands the driver generates).
The clang-cl driver on the other hand, doesn't know what "-fmodules" is. So you are left with doing the Driver's job on your own in this case, choosing the proper options and passing them directly to 'clang cc1' using -Xclang. So in your example, simply no cache dir was provided to 'clang cc1'.
Thanks, Elad
-----Original Message-----
From: Nathan Froyd [mailto:nfroyd at mozilla.com]
Sent: Monday, October 31, 2016 16:38
To: Cohen, Elad2 <elad2.cohen at intel.com>
Cc: Hans Wennborg <hans at chromium.org>; Demikhovsky, Elena <elena.demikhovsky at intel.com>; Reid Kleckner <rnk at google.com>; Blank, Guy <guy.blank at intel.com>; Aboud, Amjad <amjad.aboud at intel.com>; Craig Topper <craig.topper at gmail.com>; Badouh, Asaf <asaf.badouh at intel.com>; Zuckerman, Michael <michael.zuckerman at intel.com>; Breger, Igor <igor.breger at intel.com>; Nico Weber <thakis at chromium.org>; cfe-dev <cfe-dev at lists.llvm.org>; Ouriel, Boaz <boaz.ouriel at intel.com>; Rackover, Zvi <zvi.rackover at intel.com>
Subject: Re: [cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and compatibility with MSVC's <intrin.h>
This does indeed work, thanks for the pointer. But why? The documentation for -fmodules-cache-path indicates that "Clang will select a system-appropriate default" if this option is not provided.
So explicitly providing it shouldn't make any difference, right? Or did the system-appropriate default perhaps have stale information in it?
-Nathan
On Sun, Oct 30, 2016 at 5:19 AM, Cohen, Elad2 <elad2.cohen at intel.com> wrote:
> Hi Nathan,
>
>> I've been following the review below and its successor, D25767, but
>> I'm unable to make the proposed solution (or its equivalent) work.
>> If I specify:
>>
>> -Xclang -fimplicit-module-maps -Xclang -fmodule-map-file=[builtin
>> .modulemap]
>>
>> I still get an error about an unresolved __tzcnt_u32 symbol. If I
>> remove the "-Xclang -fimplicit-module-maps" from the above, I get the
>> same error. If I try instead:
>>
>> -Xclang -fmodules -Xclang -fmodule-map-file=[builtin .modulemap]
>>
>> I get peculiar errors about x86intrin.h and the _Builtin_intrinsics
>> module not being available and implicit use of module files being
>> disabled. (Same goes for _Builtin_stddef_max_align_t and
>> __stddef_max_align_t.h.) Adding "-Xclang -fimplicit-module-maps"
>> gives me the same error.
>>
>> I'm not sure what's going wrong, or if there's been some sort of
>> mistake in generating the module map. Help?
>
> The equivalent command line using clang-cl should specify:
> -Xclang -fmodules -Xclang -fmodules-cache-path=[modules cache dir]
> -Xclang -fmodule-map-file=[builtin .modulemap] Let me know if this still doesn't work.
>
> Thanks, Elad
>
> -----Original Message-----
> From: hwennborg at google.com [mailto:hwennborg at google.com] On Behalf Of
> Hans Wennborg
> Sent: Saturday, October 29, 2016 01:56
> To: Nathan Froyd <nfroyd at mozilla.com>
> Cc: Cohen, Elad2 <elad2.cohen at intel.com>; Demikhovsky, Elena
> <elena.demikhovsky at intel.com>; Reid Kleckner <rnk at google.com>; Blank,
> Guy <guy.blank at intel.com>; Aboud, Amjad <amjad.aboud at intel.com>; Craig
> Topper <craig.topper at gmail.com>; Badouh, Asaf <asaf.badouh at intel.com>;
> Zuckerman, Michael <michael.zuckerman at intel.com>; Breger, Igor
> <igor.breger at intel.com>; Nico Weber <thakis at chromium.org>; cfe-dev
> <cfe-dev at lists.llvm.org>; Ouriel, Boaz <boaz.ouriel at intel.com>;
> Rackover, Zvi <zvi.rackover at intel.com>
> Subject: Re: [cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and
> compatibility with MSVC's <intrin.h>
>
> I haven't been following this thread very closely since the topic shifted towards using modules for the intrinsics headers.
>
> Is the underlying problem we're trying to solve still how to expose _tzcnt_u32? Isn't it enough to just make it available in intrin.h instead of (or addition to) bmiintrin.h? It's special in that it really doesn't require a BMI-enabled target to be useful, and is probably one of very few instructions with this property?
>
> - Hans
>
> On Fri, Oct 28, 2016 at 3:46 PM, Nathan Froyd <nfroyd at mozilla.com> wrote:
>> I've been following the review below and its successor, D25767, but
>> I'm unable to make the proposed solution (or its equivalent) work.
>> If I specify:
>>
>> -Xclang -fimplicit-module-maps -Xclang -fmodule-map-file=[builtin
>> .modulemap]
>>
>> I still get an error about an unresolved __tzcnt_u32 symbol. If I
>> remove the "-Xclang -fimplicit-module-maps" from the above, I get the
>> same error. If I try instead:
>>
>> -Xclang -fmodules -Xclang -fmodule-map-file=[builtin .modulemap]
>>
>> I get peculiar errors about x86intrin.h and the _Builtin_intrinsics
>> module not being available and implicit use of module files being
>> disabled. (Same goes for _Builtin_stddef_max_align_t and
>> __stddef_max_align_t.h.) Adding "-Xclang -fimplicit-module-maps"
>> gives me the same error.
>>
>> I'm not sure what's going wrong, or if there's been some sort of
>> mistake in generating the module map. Help?
>>
>> Thanks,
>> -Nathan
>>
>> On Thu, Oct 6, 2016 at 2:45 PM, Cohen, Elad2 <elad2.cohen at intel.com> wrote:
>>> I uploaded to review the patch that adds the option for enabling the
>>> modules feature exclusively for the intrinsic header files:
>>> https://reviews.llvm.org/D25337
>>> You can try it out by adding the –fexclusive-builtin-modules command
>>> line option. (all the different x86 *intrin.h files should be
>>> included by default since the #if guards already contain a __has_feature(modules) condition).
>>>
>>>
>>>
>>> Thanks, Elad
>>>
>>>
>>>
>>> From: Cohen, Elad2
>>> Sent: Friday, September 30, 2016 16:54
>>> To: Demikhovsky, Elena <elena.demikhovsky at intel.com>; Reid Kleckner
>>> <rnk at google.com>; Hans Wennborg <hans at chromium.org>; Blank, Guy
>>> <guy.blank at intel.com>; Aboud, Amjad <amjad.aboud at intel.com>; Craig
>>> Topper <craig.topper at gmail.com>; Badouh, Asaf
>>> <asaf.badouh at intel.com>; Zuckerman, Michael
>>> <michael.zuckerman at intel.com>; Breger, Igor <igor.breger at intel.com>
>>> Cc: Nathan Froyd <nfroyd at mozilla.com>; Nico Weber
>>> <thakis at chromium.org>; cfe-dev <cfe-dev at lists.llvm.org>; Ouriel,
>>> Boaz <boaz.ouriel at intel.com>; Rackover, Zvi <zvi.rackover at intel.com>
>>>
>>>
>>> Subject: RE: [cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and
>>> compatibility with MSVC's <intrin.h>
>>>
>>>
>>>
>>> Hi All,
>>>
>>>
>>>
>>> As mentioned in past discussions, the introduction of AVX512
>>> intrinsics had a severe impact on the compile time.
>>>
>>> The extreme case happens on windows where the mere inclusion of
>>> system headers like <string> causes a hit in the compile time of
>>> applications that don’t even make use of any intrinsics.
>>>
>>> http://lists.llvm.org/pipermail/cfe-dev/2016-May/048837.html
>>> http://lists.llvm.org/pipermail/cfe-dev/2016-June/049460.html
>>>
>>> During these discussions Chandler Carruth indicated that usage of
>>> clang modules is probably the best alternative to mitigate this problem:
>>>
>>> http://lists.llvm.org/pipermail/cfe-dev/2016-June/049513.html
>>>
>>>
>>>
>>> Based on this feedback, Intel has started working on this proposal.
>>> Initial Internal measurements indicated that this direction is promising.
>>>
>>> For example, the compile time of a single translation unit that
>>> includes all of the intel intrinsic header files would reduce by 90%
>>> starting from the 2nd compilation (by using cached pre-compiled
>>> modules). In general, we saw that compilation of projects that have
>>> more than one translation unit that includes x86intrin.h/immintrin.h
>>> will get big compile time reductions starting from the 1st compilation.
>>>
>>>
>>>
>>> The biggest advantage of this approach is that it scales and would
>>> help also applications that do rely on intrinsics and hence will not
>>> be able to avoid the compile time overhead.
>>>
>>>
>>>
>>> I (Intel) would like to add a Clang option that enables the modules
>>> feature exclusively for the Intel intrinsics header files, and set
>>> this option to be turned on by default. This allows clang to cache a
>>> pre-compiled (actually
>>> pre-parsed) module of all the headers included by
>>> x86intrin.h/immintrin.h, and any following compilation of
>>> #include<x86intrin.h>/<immintrin.h>
>>> directives will skip the parsing phase and save precious compile
>>> time. Since ‘clang modules’ and the regular ‘pre-processor include
>>> mechanism’ are not 100% compatible, I started by cleaning some small
>>> Intel intrinsics module bugs exposed by this work:
>>>
>>> https://reviews.llvm.org/D23871, https://reviews.llvm.org/D24825,
>>> https://reviews.llvm.org/D24752 (This one is still in review).
>>>
>>> I plan to upload the actual patch that adds the option I mentioned
>>> above somewhere next week hopefully. Once this patch is uploaded,
>>> you could experiment with it and see if it actually mitigates the
>>> problem on your end and provide us with valuable feedback on this direction.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Elad
>>>
>>>
>>>
>>>
>>>
>>> From: Demikhovsky, Elena
>>> Sent: Thursday, September 29, 2016 22:20
>>> To: Reid Kleckner <rnk at google.com>; Hans Wennborg
>>> <hans at chromium.org>; Blank, Guy <guy.blank at intel.com>; Aboud, Amjad
>>> <amjad.aboud at intel.com>; Craig Topper <craig.topper at gmail.com>;
>>> Badouh, Asaf <asaf.badouh at intel.com>; Zuckerman, Michael
>>> <michael.zuckerman at intel.com>; Breger, Igor <igor.breger at intel.com>
>>> Cc: Nathan Froyd <nfroyd at mozilla.com>; Nico Weber
>>> <thakis at chromium.org>; cfe-dev <cfe-dev at lists.llvm.org>; Ouriel,
>>> Boaz <boaz.ouriel at intel.com>; Cohen, Elad2 <elad2.cohen at intel.com>
>>> Subject: RE: [cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and
>>> compatibility with MSVC's <intrin.h>
>>>
>>>
>>>
>>> Forwarding your mail to Elad who is working now on intrinsics.
>>> Please take into account a possible delay in Elad’s response, we are
>>> in local holidays this week.
>>>
>>>
>>>
>>> - Elena
>>>
>>>
>>>
>>> From: Reid Kleckner [mailto:rnk at google.com]
>>> Sent: Thursday, September 29, 2016 22:02
>>> To: Hans Wennborg <hans at chromium.org>; Blank, Guy
>>> <guy.blank at intel.com>; Aboud, Amjad <amjad.aboud at intel.com>;
>>> Demikhovsky, Elena <elena.demikhovsky at intel.com>; Craig Topper
>>> <craig.topper at gmail.com>; Badouh, Asaf <asaf.badouh at intel.com>;
>>> Zuckerman, Michael <michael.zuckerman at intel.com>; Breger, Igor
>>> <igor.breger at intel.com>
>>> Cc: Nathan Froyd <nfroyd at mozilla.com>; Nico Weber
>>> <thakis at chromium.org>; cfe-dev <cfe-dev at lists.llvm.org>
>>> Subject: Re: [cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and
>>> compatibility with MSVC's <intrin.h>
>>>
>>>
>>>
>>> We've been here before:
>>>
>>> https://llvm.org/bugs/show_bug.cgi?id=25544
>>>
>>> http://llvm.org/viewvc/llvm-project?view=revision&revision=253358
>>>
>>> https://bugs.chromium.org/p/chromium/issues/detail?id=556735
>>>
>>>
>>>
>>> Nico's change to avoid including bmiintrin.h if _MSC_VER is set to
>>> reduce compile time brought the problem back.
>>>
>>>
>>>
>>> Basically, Intel's immintrin.h interface is too big. It's windows.h
>>> all over again. It significantly slows down builds of simple
>>> projects that include basic STL headers like <string>. Nico was able
>>> to speed up the *overall* build time of Chromium by at least 10%
>>> (ask him for
>>> specifics) by adding all these '#if !defined(_MSC_VER)' checks to
>>> immintrin.h. However, for compatibility, I think we may need
>>> intrin.h to include most of that stuff by default.
>>>
>>>
>>>
>>> I added a bunch of Intel folks to basically ask the question: Can we
>>> please go back to the old days of mmintrin.h, xmmintrin.h, then emmintrin.h?
>>>
>>>
>>>
>>> If not, we will have to consider adding an evil configuration macro,
>>> like WIN32_LEAN_AND_MEAN for windows.h, that opts out of all this
>>> extra immintrin.h functionality to speed up compilation.
>>>
>>>
>>>
>>> On Thu, Sep 29, 2016 at 11:04 AM, Hans Wennborg <hans at chromium.org> wrote:
>>>
>>> On Thu, Sep 29, 2016 at 10:50 AM, Nathan Froyd via cfe-dev
>>> <cfe-dev at lists.llvm.org> wrote:
>>>> [While I filed this as https://llvm.org/bugs/show_bug.cgi?id=30506,
>>>> Paul Robinson suggested in the bug that it might be worth bringing
>>>> to cfe-dev for wider discussion.]
>>>>
>>>> Firefox's copy of FFmpeg includes <intrin.h> on MSVC:
>>>>
>>>>
>>>> https://dxr.mozilla.org/mozilla-central/source/media/ffvpx/libavuti
>>>> l
>>>> /x86/intmath.h#26
>>>>
>>>> and MSVC's <intrin.h> (after including several other headers)
>>>> declares _tzcnt_u32. clang-cl declares _tzcnt_u32 in
>>>> <bmiintrin.h>, but <bmiintrin.h> is only included from <intrin.h>
>>>> if an appropriate target CPU is detected:
>>>>
>>>>
>>>> https://github.com/llvm-mirror/clang/blob/master/lib/Headers/immint
>>>> r
>>>> in.h#L82
>>>>
>>>> even though _tzcnt_u32 (or rather, its underlying implementation
>>>> function, __tzcnt_u32) is explicitly declared to be available
>>>> everywhere:
>>>>
>>>>
>>>> https://github.com/llvm-mirror/clang/blob/master/lib/Headers/bmiint
>>>> r
>>>> in.h#L284
>>>>
>>>> The net result is that Firefox's copy of FFmpeg doesn't compile
>>>> with clang-cl because of this issue. Upstream has the same code,
>>>> so I assume the same is true there:
>>>>
>>>> https://github.com/FFmpeg/FFmpeg/blob/master/libavutil/x86/intmath.
>>>> h
>>>> #L26
>>>>
>>>> AFAICT, this behavior was changed by only including <bmiitrin.h> if
>>>> __BMI__ is defined in:
>>>>
>>>> http://reviews.llvm.org/D20291
>>>>
>>>> with the desirable goal of reducing compile time. But this change
>>>> also broke things like _tzcnt_u32 being available.
>>>>
>>>> What is the right thing to do here? Should <bmiintrin.h> be
>>>> unconditionally included, or should something else be done?
>>>
>>> I think these intrinsics (it's really just tzcnt though?) that are
>>> useful also for non-BMI targets should probably be moved out of the
>>> BMI-specific header.
>>>
>>> +thakis and rnk if they have any thoughts on this.
>>>
>>>
>>>
>>> --------------------------------------------------------------------
>>> -
>>> Intel Israel (74) Limited
>>>
>>> This e-mail and any attachments may contain confidential material
>>> for the sole use of the intended recipient(s). Any review or
>>> distribution by others is strictly prohibited. If you are not the
>>> intended recipient, please contact the sender and delete all copies.
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
More information about the cfe-dev
mailing list