[cfe-dev] The intrinsics headers (especially avx512) are too big. What to do about it?

Reid Kleckner via cfe-dev cfe-dev at lists.llvm.org
Thu May 12 16:10:06 PDT 2016


I think our approach to the mmintrin headers doesn't scale. We're
creating the windows.h of intel intrinsics in immintrin.h.

When they were first created, a large percentage of the intrinsics
were mapping from hyper-specific instruction names to generic vector
math operations like this:

static __inline__ __m128 __DEFAULT_FN_ATTRS
_mm_add_ps(__m128 __a, __m128 __b) { return __a + __b; }

This made a lot of sense at the time, because we could just write come
C and not worry about teaching clang and LLVM about every Intel
intrinsic under the sun.

>From looking at the avx512 headers, it seems this is no longer the
case. Now we are mostly mapping from _mm_* intrinsic to
__builtin_ia32_ function.

If this continues to be the case going forward, then I think we should
make the _mm* intrinsics into compiler builtins like the
__builtin_ia32 functions. It also avoids the need for those ugly
forwarding macros for intrinsics that take arguments that must be
constant.

The _mm_* builtins should only be available if the user includes
<immintrin.h>. We can replace the contents of that file with a pragma
that just says "enable all intel intrinsics".

On Thu, May 12, 2016 at 9:16 AM, Nico Weber via cfe-dev
<cfe-dev at lists.llvm.org> wrote:
> Hi,
>
> on Windows, C++ system headers like e.g. <string> end up pulling in
> intrin.h. clang's intrinsic headers are very large.
>
> If you take a cc file containing just `#include <string>` and run that
> through the preprocessor with `cl /P test.cc` and `clang-cl /P test.cc`, the
> test.I file generated by clang-cl is 1.7MB while the one created by cl.exe
> is 0.7MB. This is solely due to clang's intrin.h expanding to way more
> stuff.
>
> The biggest offenders are avx512vlintrin.h, avx512fintrin.h,
> avx512vlbwintrin.h which add up to 657kB already. Before r239883, we only
> included avx headers if __AVX512F__ etc was defined. This is currently never
> the case in practice. Later (r243394 r243402 r243406 and more), the avx
> headers got much bigger.
>
> Parsing all this code takes time -- removing the avx512 includes from
> immintrin.h locally makes compiling a file containing just the <string>
> header 0.25s faster (!), and building all of v8 gets 6% faster, just from
> not including the avx512 headers.
>
> What can we do about this? Since avx512 is new, maybe they could be not part
> of immintrin.h? Or we could re-introduce
>
>   #if !__has_feature(modules) && defined(__AVX512BW__)
>
> include guards in immintrin.h. This would give us a speed win immediately
> without drawbacks as far as I can see, but in a few years when people start
> compiling with /arch:avx512 that'd go away again. (Then again, by then,
> modules are hopefully commonly available. cl.exe doesn't have an
> /arch:avx512 switch yet, so this is probably several years away from
> happening.)
>
> Comments? Is it feasible to require that people who want to use avx512
> include a new header instead of immintrin.h? Else, does anyone have a better
> idea other than reintroducing the #ifdefs, augmented with the module check?
>
> Thanks,
> Nico
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>



More information about the cfe-dev mailing list