[cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and compatibility with MSVC's <intrin.h>

Hans Wennborg via cfe-dev cfe-dev at lists.llvm.org
Fri Oct 28 15:55:45 PDT 2016


I haven't been following this thread very closely since the topic
shifted towards using modules for the intrinsics headers.

Is the underlying problem we're trying to solve still how to expose
_tzcnt_u32? Isn't it enough to just make it available in intrin.h
instead of (or addition to) bmiintrin.h? It's special in that it
really doesn't require a BMI-enabled target to be useful, and is
probably one of very few instructions with this property?

 - Hans

On Fri, Oct 28, 2016 at 3:46 PM, Nathan Froyd <nfroyd at mozilla.com> wrote:
> I've been following the review below and its successor, D25767, but
> I'm unable to make the proposed solution (or its equivalent) work.  If
> I specify:
>
> -Xclang -fimplicit-module-maps -Xclang -fmodule-map-file=[builtin .modulemap]
>
> I still get an error about an unresolved __tzcnt_u32 symbol.  If I
> remove the "-Xclang -fimplicit-module-maps" from the above, I get the
> same error.  If I try instead:
>
> -Xclang -fmodules -Xclang -fmodule-map-file=[builtin .modulemap]
>
> I get peculiar errors about x86intrin.h and the _Builtin_intrinsics
> module not being available and implicit use of module files being
> disabled.  (Same goes for _Builtin_stddef_max_align_t and
> __stddef_max_align_t.h.)  Adding "-Xclang -fimplicit-module-maps"
> gives me the same error.
>
> I'm not sure what's going wrong, or if there's been some sort of
> mistake in generating the module map.  Help?
>
> Thanks,
> -Nathan
>
> On Thu, Oct 6, 2016 at 2:45 PM, Cohen, Elad2 <elad2.cohen at intel.com> wrote:
>> I uploaded to review the patch that adds the option for enabling the modules
>> feature exclusively for the intrinsic header files:
>> https://reviews.llvm.org/D25337
>> You can try it out by adding the –fexclusive-builtin-modules command line
>> option. (all the different x86 *intrin.h files should be included by default
>> since the #if guards already contain a __has_feature(modules) condition).
>>
>>
>>
>> Thanks, Elad
>>
>>
>>
>> From: Cohen, Elad2
>> Sent: Friday, September 30, 2016 16:54
>> To: Demikhovsky, Elena <elena.demikhovsky at intel.com>; Reid Kleckner
>> <rnk at google.com>; Hans Wennborg <hans at chromium.org>; Blank, Guy
>> <guy.blank at intel.com>; Aboud, Amjad <amjad.aboud at intel.com>; Craig Topper
>> <craig.topper at gmail.com>; Badouh, Asaf <asaf.badouh at intel.com>; Zuckerman,
>> Michael <michael.zuckerman at intel.com>; Breger, Igor <igor.breger at intel.com>
>> Cc: Nathan Froyd <nfroyd at mozilla.com>; Nico Weber <thakis at chromium.org>;
>> cfe-dev <cfe-dev at lists.llvm.org>; Ouriel, Boaz <boaz.ouriel at intel.com>;
>> Rackover, Zvi <zvi.rackover at intel.com>
>>
>>
>> Subject: RE: [cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and compatibility
>> with MSVC's <intrin.h>
>>
>>
>>
>> Hi All,
>>
>>
>>
>> As mentioned in past discussions, the introduction of AVX512 intrinsics had
>> a severe impact on the compile time.
>>
>> The extreme case happens on windows where the mere inclusion of system
>> headers like <string> causes a hit in the compile time of applications that
>> don’t even make use of any intrinsics.
>>
>> http://lists.llvm.org/pipermail/cfe-dev/2016-May/048837.html
>> http://lists.llvm.org/pipermail/cfe-dev/2016-June/049460.html
>>
>> During these discussions Chandler Carruth indicated that usage of clang
>> modules is probably the best alternative to mitigate this problem:
>>
>> http://lists.llvm.org/pipermail/cfe-dev/2016-June/049513.html
>>
>>
>>
>> Based on this feedback, Intel has started working on this proposal. Initial
>> Internal measurements indicated that this direction is promising.
>>
>> For example, the compile time of a single translation unit that includes all
>> of the intel intrinsic header files would reduce by 90% starting from the
>> 2nd compilation (by using cached pre-compiled modules). In general, we saw
>> that compilation of projects that have more than one translation unit that
>> includes x86intrin.h/immintrin.h will get big compile time reductions
>> starting from the 1st compilation.
>>
>>
>>
>> The biggest advantage of this approach is that it scales and would help also
>> applications that do rely on intrinsics and hence will not be able to avoid
>> the compile time overhead.
>>
>>
>>
>> I (Intel) would like to add a Clang option that enables the modules feature
>> exclusively for the Intel intrinsics header files, and set this option to be
>> turned on by default. This allows clang to cache a pre-compiled (actually
>> pre-parsed) module of all the headers included by x86intrin.h/immintrin.h,
>> and any following compilation of #include<x86intrin.h>/<immintrin.h>
>> directives will skip the parsing phase and save precious compile time. Since
>> ‘clang modules’ and the regular ‘pre-processor include mechanism’ are not
>> 100% compatible, I started by cleaning some small Intel intrinsics module
>> bugs exposed by this work:
>>
>> https://reviews.llvm.org/D23871, https://reviews.llvm.org/D24825,
>> https://reviews.llvm.org/D24752 (This one is still in review).
>>
>> I plan to upload the actual patch that adds the option I mentioned above
>> somewhere next week hopefully. Once this patch is uploaded, you could
>> experiment with it and see if it actually mitigates the problem on your end
>> and provide us with valuable feedback on this direction.
>>
>>
>>
>> Thanks,
>>
>> Elad
>>
>>
>>
>>
>>
>> From: Demikhovsky, Elena
>> Sent: Thursday, September 29, 2016 22:20
>> To: Reid Kleckner <rnk at google.com>; Hans Wennborg <hans at chromium.org>;
>> Blank, Guy <guy.blank at intel.com>; Aboud, Amjad <amjad.aboud at intel.com>;
>> Craig Topper <craig.topper at gmail.com>; Badouh, Asaf <asaf.badouh at intel.com>;
>> Zuckerman, Michael <michael.zuckerman at intel.com>; Breger, Igor
>> <igor.breger at intel.com>
>> Cc: Nathan Froyd <nfroyd at mozilla.com>; Nico Weber <thakis at chromium.org>;
>> cfe-dev <cfe-dev at lists.llvm.org>; Ouriel, Boaz <boaz.ouriel at intel.com>;
>> Cohen, Elad2 <elad2.cohen at intel.com>
>> Subject: RE: [cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and compatibility
>> with MSVC's <intrin.h>
>>
>>
>>
>> Forwarding your mail to Elad who is working now on intrinsics. Please take
>> into account a possible delay in Elad’s response, we are in local holidays
>> this week.
>>
>>
>>
>> -           Elena
>>
>>
>>
>> From: Reid Kleckner [mailto:rnk at google.com]
>> Sent: Thursday, September 29, 2016 22:02
>> To: Hans Wennborg <hans at chromium.org>; Blank, Guy <guy.blank at intel.com>;
>> Aboud, Amjad <amjad.aboud at intel.com>; Demikhovsky, Elena
>> <elena.demikhovsky at intel.com>; Craig Topper <craig.topper at gmail.com>;
>> Badouh, Asaf <asaf.badouh at intel.com>; Zuckerman, Michael
>> <michael.zuckerman at intel.com>; Breger, Igor <igor.breger at intel.com>
>> Cc: Nathan Froyd <nfroyd at mozilla.com>; Nico Weber <thakis at chromium.org>;
>> cfe-dev <cfe-dev at lists.llvm.org>
>> Subject: Re: [cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and compatibility
>> with MSVC's <intrin.h>
>>
>>
>>
>> We've been here before:
>>
>> https://llvm.org/bugs/show_bug.cgi?id=25544
>>
>> http://llvm.org/viewvc/llvm-project?view=revision&revision=253358
>>
>> https://bugs.chromium.org/p/chromium/issues/detail?id=556735
>>
>>
>>
>> Nico's change to avoid including bmiintrin.h if _MSC_VER is set to reduce
>> compile time brought the problem back.
>>
>>
>>
>> Basically, Intel's immintrin.h interface is too big. It's windows.h all over
>> again. It significantly slows down builds of simple projects that include
>> basic STL headers like <string>. Nico was able to speed up the *overall*
>> build time of Chromium by at least 10% (ask him for specifics) by adding all
>> these '#if !defined(_MSC_VER)' checks to immintrin.h. However, for
>> compatibility, I think we may need intrin.h to include most of that stuff by
>> default.
>>
>>
>>
>> I added a bunch of Intel folks to basically ask the question: Can we please
>> go back to the old days of mmintrin.h, xmmintrin.h, then emmintrin.h?
>>
>>
>>
>> If not, we will have to consider adding an evil configuration macro, like
>> WIN32_LEAN_AND_MEAN for windows.h, that opts out of all this extra
>> immintrin.h functionality to speed up compilation.
>>
>>
>>
>> On Thu, Sep 29, 2016 at 11:04 AM, Hans Wennborg <hans at chromium.org> wrote:
>>
>> On Thu, Sep 29, 2016 at 10:50 AM, Nathan Froyd via cfe-dev
>> <cfe-dev at lists.llvm.org> wrote:
>>> [While I filed this as https://llvm.org/bugs/show_bug.cgi?id=30506,
>>> Paul Robinson suggested in the bug that it might be worth bringing to
>>> cfe-dev for wider discussion.]
>>>
>>> Firefox's copy of FFmpeg includes <intrin.h> on MSVC:
>>>
>>>
>>> https://dxr.mozilla.org/mozilla-central/source/media/ffvpx/libavutil/x86/intmath.h#26
>>>
>>> and MSVC's <intrin.h> (after including several other headers) declares
>>> _tzcnt_u32.  clang-cl declares _tzcnt_u32 in <bmiintrin.h>, but
>>> <bmiintrin.h> is only included from <intrin.h> if an appropriate
>>> target CPU is detected:
>>>
>>>
>>> https://github.com/llvm-mirror/clang/blob/master/lib/Headers/immintrin.h#L82
>>>
>>> even though _tzcnt_u32 (or rather, its underlying implementation
>>> function, __tzcnt_u32) is explicitly declared to be available
>>> everywhere:
>>>
>>>
>>> https://github.com/llvm-mirror/clang/blob/master/lib/Headers/bmiintrin.h#L284
>>>
>>> The net result is that Firefox's copy of FFmpeg doesn't compile with
>>> clang-cl because of this issue.  Upstream has the same code, so I
>>> assume the same is true there:
>>>
>>> https://github.com/FFmpeg/FFmpeg/blob/master/libavutil/x86/intmath.h#L26
>>>
>>> AFAICT, this behavior was changed by only including <bmiitrin.h> if
>>> __BMI__ is defined in:
>>>
>>> http://reviews.llvm.org/D20291
>>>
>>> with the desirable goal of reducing compile time.  But this change
>>> also broke things like _tzcnt_u32 being available.
>>>
>>> What is the right thing to do here?  Should <bmiintrin.h> be
>>> unconditionally included, or should something else be done?
>>
>> I think these intrinsics (it's really just tzcnt though?) that are
>> useful also for non-BMI targets should probably be moved out of the
>> BMI-specific  header.
>>
>> +thakis and rnk if they have any thoughts on this.
>>
>>
>>
>> ---------------------------------------------------------------------
>> Intel Israel (74) Limited
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.



More information about the cfe-dev mailing list