<div dir="ltr"><div>We've been here before:</div><div><a href="https://llvm.org/bugs/show_bug.cgi?id=25544">https://llvm.org/bugs/show_bug.cgi?id=25544</a><br></div><div><a href="http://llvm.org/viewvc/llvm-project?view=revision&revision=253358">http://llvm.org/viewvc/llvm-project?view=revision&revision=253358</a><br></div><div><a href="https://bugs.chromium.org/p/chromium/issues/detail?id=556735">https://bugs.chromium.org/p/chromium/issues/detail?id=556735</a><br></div><div><br></div><div>Nico's change to avoid including bmiintrin.h if _MSC_VER is set to reduce compile time brought the problem back.</div><div><br></div>Basically, Intel's immintrin.h interface is too big. It's windows.h all over again. It significantly slows down builds of simple projects that include basic STL headers like <string>. Nico was able to speed up the *overall* build time of Chromium by at least 10% (ask him for specifics) by adding all these '#if !defined(_MSC_VER)' checks to immintrin.h. However, for compatibility, I think we may need intrin.h to include most of that stuff by default.<div><br></div><div>I added a bunch of Intel folks to basically ask the question: Can we please go back to the old days of mmintrin.h, xmmintrin.h, then emmintrin.h?</div><div><br></div><div>If not, we will have to consider adding an evil configuration macro, like WIN32_LEAN_AND_MEAN for windows.h, that opts out of all this extra immintrin.h functionality to speed up compilation.</div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Sep 29, 2016 at 11:04 AM, Hans Wennborg <span dir="ltr"><<a href="mailto:hans@chromium.org" target="_blank">hans@chromium.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-m_5062786893483266701HOEnZb"><div class="gmail-m_5062786893483266701h5">On Thu, Sep 29, 2016 at 10:50 AM, Nathan Froyd via cfe-dev<br>
<<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>> wrote:<br>
> [While I filed this as <a href="https://llvm.org/bugs/show_bug.cgi?id=30506" rel="noreferrer" target="_blank">https://llvm.org/bugs/show_bug<wbr>.cgi?id=30506</a>,<br>
> Paul Robinson suggested in the bug that it might be worth bringing to<br>
> cfe-dev for wider discussion.]<br>
><br>
> Firefox's copy of FFmpeg includes <intrin.h> on MSVC:<br>
><br>
> <a href="https://dxr.mozilla.org/mozilla-central/source/media/ffvpx/libavutil/x86/intmath.h#26" rel="noreferrer" target="_blank">https://dxr.mozilla.org/mozill<wbr>a-central/source/media/ffvpx/<wbr>libavutil/x86/intmath.h#26</a><br>
><br>
> and MSVC's <intrin.h> (after including several other headers) declares<br>
> _tzcnt_u32.  clang-cl declares _tzcnt_u32 in <bmiintrin.h>, but<br>
> <bmiintrin.h> is only included from <intrin.h> if an appropriate<br>
> target CPU is detected:<br>
><br>
> <a href="https://github.com/llvm-mirror/clang/blob/master/lib/Headers/immintrin.h#L82" rel="noreferrer" target="_blank">https://github.com/llvm-mirror<wbr>/clang/blob/master/lib/Headers<wbr>/immintrin.h#L82</a><br>
><br>
> even though _tzcnt_u32 (or rather, its underlying implementation<br>
> function, __tzcnt_u32) is explicitly declared to be available<br>
> everywhere:<br>
><br>
> <a href="https://github.com/llvm-mirror/clang/blob/master/lib/Headers/bmiintrin.h#L284" rel="noreferrer" target="_blank">https://github.com/llvm-mirror<wbr>/clang/blob/master/lib/Headers<wbr>/bmiintrin.h#L284</a><br>
><br>
> The net result is that Firefox's copy of FFmpeg doesn't compile with<br>
> clang-cl because of this issue.  Upstream has the same code, so I<br>
> assume the same is true there:<br>
><br>
> <a href="https://github.com/FFmpeg/FFmpeg/blob/master/libavutil/x86/intmath.h#L26" rel="noreferrer" target="_blank">https://github.com/FFmpeg/FFmp<wbr>eg/blob/master/libavutil/x86/<wbr>intmath.h#L26</a><br>
><br>
> AFAICT, this behavior was changed by only including <bmiitrin.h> if<br>
> __BMI__ is defined in:<br>
><br>
> <a href="http://reviews.llvm.org/D20291" rel="noreferrer" target="_blank">http://reviews.llvm.org/D20291</a><br>
><br>
> with the desirable goal of reducing compile time.  But this change<br>
> also broke things like _tzcnt_u32 being available.<br>
><br>
> What is the right thing to do here?  Should <bmiintrin.h> be<br>
> unconditionally included, or should something else be done?<br>
<br>
</div></div>I think these intrinsics (it's really just tzcnt though?) that are<br>
useful also for non-BMI targets should probably be moved out of the<br>
BMI-specific  header.<br>
<br>
+thakis and rnk if they have any thoughts on this.<br>
</blockquote></div><br></div></div>