[cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and compatibility with MSVC's <intrin.h>
Cohen, Elad2 via cfe-dev
cfe-dev at lists.llvm.org
Fri Sep 30 06:53:32 PDT 2016
As mentioned in past discussions, the introduction of AVX512 intrinsics had a severe impact on the compile time.
The extreme case happens on windows where the mere inclusion of system headers like <string> causes a hit in the compile time of applications that don’t even make use of any intrinsics.
During these discussions Chandler Carruth indicated that usage of clang modules is probably the best alternative to mitigate this problem:
Based on this feedback, Intel has started working on this proposal. Initial Internal measurements indicated that this direction is promising.
For example, the compile time of a single translation unit that includes all of the intel intrinsic header files would reduce by 90% starting from the 2nd compilation (by using cached pre-compiled modules). In general, we saw that compilation of projects that have more than one translation unit that includes x86intrin.h/immintrin.h will get big compile time reductions starting from the 1st compilation.
The biggest advantage of this approach is that it scales and would help also applications that do rely on intrinsics and hence will not be able to avoid the compile time overhead.
I (Intel) would like to add a Clang option that enables the modules feature exclusively for the Intel intrinsics header files, and set this option to be turned on by default. This allows clang to cache a pre-compiled (actually pre-parsed) module of all the headers included by x86intrin.h/immintrin.h, and any following compilation of #include<x86intrin.h>/<immintrin.h> directives will skip the parsing phase and save precious compile time. Since ‘clang modules’ and the regular ‘pre-processor include mechanism’ are not 100% compatible, I started by cleaning some small Intel intrinsics module bugs exposed by this work:
https://reviews.llvm.org/D23871, https://reviews.llvm.org/D24825, https://reviews.llvm.org/D24752 (This one is still in review).
I plan to upload the actual patch that adds the option I mentioned above somewhere next week hopefully. Once this patch is uploaded, you could experiment with it and see if it actually mitigates the problem on your end and provide us with valuable feedback on this direction.
From: Demikhovsky, Elena
Sent: Thursday, September 29, 2016 22:20
To: Reid Kleckner <rnk at google.com>; Hans Wennborg <hans at chromium.org>; Blank, Guy <guy.blank at intel.com>; Aboud, Amjad <amjad.aboud at intel.com>; Craig Topper <craig.topper at gmail.com>; Badouh, Asaf <asaf.badouh at intel.com>; Zuckerman, Michael <michael.zuckerman at intel.com>; Breger, Igor <igor.breger at intel.com>
Cc: Nathan Froyd <nfroyd at mozilla.com>; Nico Weber <thakis at chromium.org>; cfe-dev <cfe-dev at lists.llvm.org>; Ouriel, Boaz <boaz.ouriel at intel.com>; Cohen, Elad2 <elad2.cohen at intel.com>
Subject: RE: [cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and compatibility with MSVC's <intrin.h>
Forwarding your mail to Elad who is working now on intrinsics. Please take into account a possible delay in Elad’s response, we are in local holidays this week.
From: Reid Kleckner [mailto:rnk at google.com]
Sent: Thursday, September 29, 2016 22:02
To: Hans Wennborg <hans at chromium.org<mailto:hans at chromium.org>>; Blank, Guy <guy.blank at intel.com<mailto:guy.blank at intel.com>>; Aboud, Amjad <amjad.aboud at intel.com<mailto:amjad.aboud at intel.com>>; Demikhovsky, Elena <elena.demikhovsky at intel.com<mailto:elena.demikhovsky at intel.com>>; Craig Topper <craig.topper at gmail.com<mailto:craig.topper at gmail.com>>; Badouh, Asaf <asaf.badouh at intel.com<mailto:asaf.badouh at intel.com>>; Zuckerman, Michael <michael.zuckerman at intel.com<mailto:michael.zuckerman at intel.com>>; Breger, Igor <igor.breger at intel.com<mailto:igor.breger at intel.com>>
Cc: Nathan Froyd <nfroyd at mozilla.com<mailto:nfroyd at mozilla.com>>; Nico Weber <thakis at chromium.org<mailto:thakis at chromium.org>>; cfe-dev <cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>>
Subject: Re: [cfe-dev] clang-cl's <intrin.h>, _tzcnt_u32, and compatibility with MSVC's <intrin.h>
We've been here before:
Nico's change to avoid including bmiintrin.h if _MSC_VER is set to reduce compile time brought the problem back.
Basically, Intel's immintrin.h interface is too big. It's windows.h all over again. It significantly slows down builds of simple projects that include basic STL headers like <string>. Nico was able to speed up the *overall* build time of Chromium by at least 10% (ask him for specifics) by adding all these '#if !defined(_MSC_VER)' checks to immintrin.h. However, for compatibility, I think we may need intrin.h to include most of that stuff by default.
I added a bunch of Intel folks to basically ask the question: Can we please go back to the old days of mmintrin.h, xmmintrin.h, then emmintrin.h?
If not, we will have to consider adding an evil configuration macro, like WIN32_LEAN_AND_MEAN for windows.h, that opts out of all this extra immintrin.h functionality to speed up compilation.
On Thu, Sep 29, 2016 at 11:04 AM, Hans Wennborg <hans at chromium.org<mailto:hans at chromium.org>> wrote:
On Thu, Sep 29, 2016 at 10:50 AM, Nathan Froyd via cfe-dev
<cfe-dev at lists.llvm.org<mailto:cfe-dev at lists.llvm.org>> wrote:
> [While I filed this as https://llvm.org/bugs/show_bug.cgi?id=30506,
> Paul Robinson suggested in the bug that it might be worth bringing to
> cfe-dev for wider discussion.]
> Firefox's copy of FFmpeg includes <intrin.h> on MSVC:
> and MSVC's <intrin.h> (after including several other headers) declares
> _tzcnt_u32. clang-cl declares _tzcnt_u32 in <bmiintrin.h>, but
> <bmiintrin.h> is only included from <intrin.h> if an appropriate
> target CPU is detected:
> even though _tzcnt_u32 (or rather, its underlying implementation
> function, __tzcnt_u32) is explicitly declared to be available
> The net result is that Firefox's copy of FFmpeg doesn't compile with
> clang-cl because of this issue. Upstream has the same code, so I
> assume the same is true there:
> AFAICT, this behavior was changed by only including <bmiitrin.h> if
> __BMI__ is defined in:
> with the desirable goal of reducing compile time. But this change
> also broke things like _tzcnt_u32 being available.
> What is the right thing to do here? Should <bmiintrin.h> be
> unconditionally included, or should something else be done?
I think these intrinsics (it's really just tzcnt though?) that are
useful also for non-BMI targets should probably be moved out of the
+thakis and rnk if they have any thoughts on this.
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the cfe-dev