[cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and compatibility with MSVC's <intrin.h>
Nathan Froyd via cfe-dev
cfe-dev at lists.llvm.org
Mon Oct 31 07:38:23 PDT 2016
This does indeed work, thanks for the pointer. But why? The
documentation for -fmodules-cache-path indicates that "Clang will
select a system-appropriate default" if this option is not provided.
So explicitly providing it shouldn't make any difference, right? Or
did the system-appropriate default perhaps have stale information in
it?
-Nathan
On Sun, Oct 30, 2016 at 5:19 AM, Cohen, Elad2 <elad2.cohen at intel.com> wrote:
> Hi Nathan,
>
>> I've been following the review below and its successor, D25767, but
>> I'm unable to make the proposed solution (or its equivalent) work. If
>> I specify:
>>
>> -Xclang -fimplicit-module-maps -Xclang -fmodule-map-file=[builtin
>> .modulemap]
>>
>> I still get an error about an unresolved __tzcnt_u32 symbol. If I
>> remove the "-Xclang -fimplicit-module-maps" from the above, I get the
>> same error. If I try instead:
>>
>> -Xclang -fmodules -Xclang -fmodule-map-file=[builtin .modulemap]
>>
>> I get peculiar errors about x86intrin.h and the _Builtin_intrinsics
>> module not being available and implicit use of module files being
>> disabled. (Same goes for _Builtin_stddef_max_align_t and
>> __stddef_max_align_t.h.) Adding "-Xclang -fimplicit-module-maps"
>> gives me the same error.
>>
>> I'm not sure what's going wrong, or if there's been some sort of
>> mistake in generating the module map. Help?
>
> The equivalent command line using clang-cl should specify:
> -Xclang -fmodules -Xclang -fmodules-cache-path=[modules cache dir] -Xclang -fmodule-map-file=[builtin .modulemap]
> Let me know if this still doesn't work.
>
> Thanks, Elad
>
> -----Original Message-----
> From: hwennborg at google.com [mailto:hwennborg at google.com] On Behalf Of Hans Wennborg
> Sent: Saturday, October 29, 2016 01:56
> To: Nathan Froyd <nfroyd at mozilla.com>
> Cc: Cohen, Elad2 <elad2.cohen at intel.com>; Demikhovsky, Elena <elena.demikhovsky at intel.com>; Reid Kleckner <rnk at google.com>; Blank, Guy <guy.blank at intel.com>; Aboud, Amjad <amjad.aboud at intel.com>; Craig Topper <craig.topper at gmail.com>; Badouh, Asaf <asaf.badouh at intel.com>; Zuckerman, Michael <michael.zuckerman at intel.com>; Breger, Igor <igor.breger at intel.com>; Nico Weber <thakis at chromium.org>; cfe-dev <cfe-dev at lists.llvm.org>; Ouriel, Boaz <boaz.ouriel at intel.com>; Rackover, Zvi <zvi.rackover at intel.com>
> Subject: Re: [cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and compatibility with MSVC's <intrin.h>
>
> I haven't been following this thread very closely since the topic shifted towards using modules for the intrinsics headers.
>
> Is the underlying problem we're trying to solve still how to expose _tzcnt_u32? Isn't it enough to just make it available in intrin.h instead of (or addition to) bmiintrin.h? It's special in that it really doesn't require a BMI-enabled target to be useful, and is probably one of very few instructions with this property?
>
> - Hans
>
> On Fri, Oct 28, 2016 at 3:46 PM, Nathan Froyd <nfroyd at mozilla.com> wrote:
>> I've been following the review below and its successor, D25767, but
>> I'm unable to make the proposed solution (or its equivalent) work. If
>> I specify:
>>
>> -Xclang -fimplicit-module-maps -Xclang -fmodule-map-file=[builtin
>> .modulemap]
>>
>> I still get an error about an unresolved __tzcnt_u32 symbol. If I
>> remove the "-Xclang -fimplicit-module-maps" from the above, I get the
>> same error. If I try instead:
>>
>> -Xclang -fmodules -Xclang -fmodule-map-file=[builtin .modulemap]
>>
>> I get peculiar errors about x86intrin.h and the _Builtin_intrinsics
>> module not being available and implicit use of module files being
>> disabled. (Same goes for _Builtin_stddef_max_align_t and
>> __stddef_max_align_t.h.) Adding "-Xclang -fimplicit-module-maps"
>> gives me the same error.
>>
>> I'm not sure what's going wrong, or if there's been some sort of
>> mistake in generating the module map. Help?
>>
>> Thanks,
>> -Nathan
>>
>> On Thu, Oct 6, 2016 at 2:45 PM, Cohen, Elad2 <elad2.cohen at intel.com> wrote:
>>> I uploaded to review the patch that adds the option for enabling the
>>> modules feature exclusively for the intrinsic header files:
>>> https://reviews.llvm.org/D25337
>>> You can try it out by adding the –fexclusive-builtin-modules command
>>> line option. (all the different x86 *intrin.h files should be
>>> included by default since the #if guards already contain a __has_feature(modules) condition).
>>>
>>>
>>>
>>> Thanks, Elad
>>>
>>>
>>>
>>> From: Cohen, Elad2
>>> Sent: Friday, September 30, 2016 16:54
>>> To: Demikhovsky, Elena <elena.demikhovsky at intel.com>; Reid Kleckner
>>> <rnk at google.com>; Hans Wennborg <hans at chromium.org>; Blank, Guy
>>> <guy.blank at intel.com>; Aboud, Amjad <amjad.aboud at intel.com>; Craig
>>> Topper <craig.topper at gmail.com>; Badouh, Asaf
>>> <asaf.badouh at intel.com>; Zuckerman, Michael
>>> <michael.zuckerman at intel.com>; Breger, Igor <igor.breger at intel.com>
>>> Cc: Nathan Froyd <nfroyd at mozilla.com>; Nico Weber
>>> <thakis at chromium.org>; cfe-dev <cfe-dev at lists.llvm.org>; Ouriel, Boaz
>>> <boaz.ouriel at intel.com>; Rackover, Zvi <zvi.rackover at intel.com>
>>>
>>>
>>> Subject: RE: [cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and
>>> compatibility with MSVC's <intrin.h>
>>>
>>>
>>>
>>> Hi All,
>>>
>>>
>>>
>>> As mentioned in past discussions, the introduction of AVX512
>>> intrinsics had a severe impact on the compile time.
>>>
>>> The extreme case happens on windows where the mere inclusion of
>>> system headers like <string> causes a hit in the compile time of
>>> applications that don’t even make use of any intrinsics.
>>>
>>> http://lists.llvm.org/pipermail/cfe-dev/2016-May/048837.html
>>> http://lists.llvm.org/pipermail/cfe-dev/2016-June/049460.html
>>>
>>> During these discussions Chandler Carruth indicated that usage of
>>> clang modules is probably the best alternative to mitigate this problem:
>>>
>>> http://lists.llvm.org/pipermail/cfe-dev/2016-June/049513.html
>>>
>>>
>>>
>>> Based on this feedback, Intel has started working on this proposal.
>>> Initial Internal measurements indicated that this direction is promising.
>>>
>>> For example, the compile time of a single translation unit that
>>> includes all of the intel intrinsic header files would reduce by 90%
>>> starting from the 2nd compilation (by using cached pre-compiled
>>> modules). In general, we saw that compilation of projects that have
>>> more than one translation unit that includes x86intrin.h/immintrin.h
>>> will get big compile time reductions starting from the 1st compilation.
>>>
>>>
>>>
>>> The biggest advantage of this approach is that it scales and would
>>> help also applications that do rely on intrinsics and hence will not
>>> be able to avoid the compile time overhead.
>>>
>>>
>>>
>>> I (Intel) would like to add a Clang option that enables the modules
>>> feature exclusively for the Intel intrinsics header files, and set
>>> this option to be turned on by default. This allows clang to cache a
>>> pre-compiled (actually
>>> pre-parsed) module of all the headers included by
>>> x86intrin.h/immintrin.h, and any following compilation of
>>> #include<x86intrin.h>/<immintrin.h>
>>> directives will skip the parsing phase and save precious compile
>>> time. Since ‘clang modules’ and the regular ‘pre-processor include
>>> mechanism’ are not 100% compatible, I started by cleaning some small
>>> Intel intrinsics module bugs exposed by this work:
>>>
>>> https://reviews.llvm.org/D23871, https://reviews.llvm.org/D24825,
>>> https://reviews.llvm.org/D24752 (This one is still in review).
>>>
>>> I plan to upload the actual patch that adds the option I mentioned
>>> above somewhere next week hopefully. Once this patch is uploaded, you
>>> could experiment with it and see if it actually mitigates the problem
>>> on your end and provide us with valuable feedback on this direction.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> Elad
>>>
>>>
>>>
>>>
>>>
>>> From: Demikhovsky, Elena
>>> Sent: Thursday, September 29, 2016 22:20
>>> To: Reid Kleckner <rnk at google.com>; Hans Wennborg
>>> <hans at chromium.org>; Blank, Guy <guy.blank at intel.com>; Aboud, Amjad
>>> <amjad.aboud at intel.com>; Craig Topper <craig.topper at gmail.com>;
>>> Badouh, Asaf <asaf.badouh at intel.com>; Zuckerman, Michael
>>> <michael.zuckerman at intel.com>; Breger, Igor <igor.breger at intel.com>
>>> Cc: Nathan Froyd <nfroyd at mozilla.com>; Nico Weber
>>> <thakis at chromium.org>; cfe-dev <cfe-dev at lists.llvm.org>; Ouriel, Boaz
>>> <boaz.ouriel at intel.com>; Cohen, Elad2 <elad2.cohen at intel.com>
>>> Subject: RE: [cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and
>>> compatibility with MSVC's <intrin.h>
>>>
>>>
>>>
>>> Forwarding your mail to Elad who is working now on intrinsics. Please
>>> take into account a possible delay in Elad’s response, we are in
>>> local holidays this week.
>>>
>>>
>>>
>>> - Elena
>>>
>>>
>>>
>>> From: Reid Kleckner [mailto:rnk at google.com]
>>> Sent: Thursday, September 29, 2016 22:02
>>> To: Hans Wennborg <hans at chromium.org>; Blank, Guy
>>> <guy.blank at intel.com>; Aboud, Amjad <amjad.aboud at intel.com>;
>>> Demikhovsky, Elena <elena.demikhovsky at intel.com>; Craig Topper
>>> <craig.topper at gmail.com>; Badouh, Asaf <asaf.badouh at intel.com>;
>>> Zuckerman, Michael <michael.zuckerman at intel.com>; Breger, Igor
>>> <igor.breger at intel.com>
>>> Cc: Nathan Froyd <nfroyd at mozilla.com>; Nico Weber
>>> <thakis at chromium.org>; cfe-dev <cfe-dev at lists.llvm.org>
>>> Subject: Re: [cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and
>>> compatibility with MSVC's <intrin.h>
>>>
>>>
>>>
>>> We've been here before:
>>>
>>> https://llvm.org/bugs/show_bug.cgi?id=25544
>>>
>>> http://llvm.org/viewvc/llvm-project?view=revision&revision=253358
>>>
>>> https://bugs.chromium.org/p/chromium/issues/detail?id=556735
>>>
>>>
>>>
>>> Nico's change to avoid including bmiintrin.h if _MSC_VER is set to
>>> reduce compile time brought the problem back.
>>>
>>>
>>>
>>> Basically, Intel's immintrin.h interface is too big. It's windows.h
>>> all over again. It significantly slows down builds of simple projects
>>> that include basic STL headers like <string>. Nico was able to speed
>>> up the *overall* build time of Chromium by at least 10% (ask him for
>>> specifics) by adding all these '#if !defined(_MSC_VER)' checks to
>>> immintrin.h. However, for compatibility, I think we may need intrin.h
>>> to include most of that stuff by default.
>>>
>>>
>>>
>>> I added a bunch of Intel folks to basically ask the question: Can we
>>> please go back to the old days of mmintrin.h, xmmintrin.h, then emmintrin.h?
>>>
>>>
>>>
>>> If not, we will have to consider adding an evil configuration macro,
>>> like WIN32_LEAN_AND_MEAN for windows.h, that opts out of all this
>>> extra immintrin.h functionality to speed up compilation.
>>>
>>>
>>>
>>> On Thu, Sep 29, 2016 at 11:04 AM, Hans Wennborg <hans at chromium.org> wrote:
>>>
>>> On Thu, Sep 29, 2016 at 10:50 AM, Nathan Froyd via cfe-dev
>>> <cfe-dev at lists.llvm.org> wrote:
>>>> [While I filed this as https://llvm.org/bugs/show_bug.cgi?id=30506,
>>>> Paul Robinson suggested in the bug that it might be worth bringing
>>>> to cfe-dev for wider discussion.]
>>>>
>>>> Firefox's copy of FFmpeg includes <intrin.h> on MSVC:
>>>>
>>>>
>>>> https://dxr.mozilla.org/mozilla-central/source/media/ffvpx/libavutil
>>>> /x86/intmath.h#26
>>>>
>>>> and MSVC's <intrin.h> (after including several other headers)
>>>> declares _tzcnt_u32. clang-cl declares _tzcnt_u32 in <bmiintrin.h>,
>>>> but <bmiintrin.h> is only included from <intrin.h> if an appropriate
>>>> target CPU is detected:
>>>>
>>>>
>>>> https://github.com/llvm-mirror/clang/blob/master/lib/Headers/immintr
>>>> in.h#L82
>>>>
>>>> even though _tzcnt_u32 (or rather, its underlying implementation
>>>> function, __tzcnt_u32) is explicitly declared to be available
>>>> everywhere:
>>>>
>>>>
>>>> https://github.com/llvm-mirror/clang/blob/master/lib/Headers/bmiintr
>>>> in.h#L284
>>>>
>>>> The net result is that Firefox's copy of FFmpeg doesn't compile with
>>>> clang-cl because of this issue. Upstream has the same code, so I
>>>> assume the same is true there:
>>>>
>>>> https://github.com/FFmpeg/FFmpeg/blob/master/libavutil/x86/intmath.h
>>>> #L26
>>>>
>>>> AFAICT, this behavior was changed by only including <bmiitrin.h> if
>>>> __BMI__ is defined in:
>>>>
>>>> http://reviews.llvm.org/D20291
>>>>
>>>> with the desirable goal of reducing compile time. But this change
>>>> also broke things like _tzcnt_u32 being available.
>>>>
>>>> What is the right thing to do here? Should <bmiintrin.h> be
>>>> unconditionally included, or should something else be done?
>>>
>>> I think these intrinsics (it's really just tzcnt though?) that are
>>> useful also for non-BMI targets should probably be moved out of the
>>> BMI-specific header.
>>>
>>> +thakis and rnk if they have any thoughts on this.
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> Intel Israel (74) Limited
>>>
>>> This e-mail and any attachments may contain confidential material for
>>> the sole use of the intended recipient(s). Any review or distribution
>>> by others is strictly prohibited. If you are not the intended
>>> recipient, please contact the sender and delete all copies.
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
More information about the cfe-dev
mailing list