[cfe-dev] The intrinsics headers (especially avx512) are too big. What to do about it?

Thu May 12 16:13:51 PDT 2016

----- Original Message -----
> From: "Reid Kleckner via cfe-dev" <cfe-dev at lists.llvm.org>
> To: "Nico Weber" <thakis at chromium.org>, "David Majnemer" <majnemer at google.com>
> Cc: "Elena Demikhovsky" <elena.demikhovsky at intel.com>, "cfe-dev" <cfe-dev at lists.llvm.org>, "asaf badouh"
> <asaf.badouh at intel.com>, "Michael zuckerman" <Michael.zuckerman at intel.com>
> Sent: Thursday, May 12, 2016 6:10:06 PM
> Subject: Re: [cfe-dev] The intrinsics headers (especially avx512) are too big. What to do about it?
> 
> I think our approach to the mmintrin headers doesn't scale. We're
> creating the windows.h of intel intrinsics in immintrin.h.
> 
> When they were first created, a large percentage of the intrinsics
> were mapping from hyper-specific instruction names to generic vector
> math operations like this:
> 
> static __inline__ __m128 __DEFAULT_FN_ATTRS
> _mm_add_ps(__m128 __a, __m128 __b) { return __a + __b; }
> 
> This made a lot of sense at the time, because we could just write
> come
> C and not worry about teaching clang and LLVM about every Intel
> intrinsic under the sun.
> 
> From looking at the avx512 headers, it seems this is no longer the
> case. Now we are mostly mapping from _mm_* intrinsic to
> __builtin_ia32_ function.
> 
> If this continues to be the case going forward,

Indeed. It is not clear to me, however, that this situation is desirable. We had a general policy that our intrinsics headers should generate generic IR whenever possible, and if we've strayed from that, we should discuss that first.

 -Hal

> then I think we
> should
> make the _mm* intrinsics into compiler builtins like the
> __builtin_ia32 functions. It also avoids the need for those ugly
> forwarding macros for intrinsics that take arguments that must be
> constant.
> 
> The _mm_* builtins should only be available if the user includes
> <immintrin.h>. We can replace the contents of that file with a pragma
> that just says "enable all intel intrinsics".
> 
> On Thu, May 12, 2016 at 9:16 AM, Nico Weber via cfe-dev
> <cfe-dev at lists.llvm.org> wrote:
> > Hi,
> >
> > on Windows, C++ system headers like e.g. <string> end up pulling in
> > intrin.h. clang's intrinsic headers are very large.
> >
> > If you take a cc file containing just `#include <string>` and run
> > that
> > through the preprocessor with `cl /P test.cc` and `clang-cl /P
> > test.cc`, the
> > test.I file generated by clang-cl is 1.7MB while the one created by
> > cl.exe
> > is 0.7MB. This is solely due to clang's intrin.h expanding to way
> > more
> > stuff.
> >
> > The biggest offenders are avx512vlintrin.h, avx512fintrin.h,
> > avx512vlbwintrin.h which add up to 657kB already. Before r239883,
> > we only
> > included avx headers if __AVX512F__ etc was defined. This is
> > currently never
> > the case in practice. Later (r243394 r243402 r243406 and more), the
> > avx
> > headers got much bigger.
> >
> > Parsing all this code takes time -- removing the avx512 includes
> > from
> > immintrin.h locally makes compiling a file containing just the
> > <string>
> > header 0.25s faster (!), and building all of v8 gets 6% faster,
> > just from
> > not including the avx512 headers.
> >
> > What can we do about this? Since avx512 is new, maybe they could be
> > not part
> > of immintrin.h? Or we could re-introduce
> >
> >   #if !__has_feature(modules) && defined(__AVX512BW__)
> >
> > include guards in immintrin.h. This would give us a speed win
> > immediately
> > without drawbacks as far as I can see, but in a few years when
> > people start
> > compiling with /arch:avx512 that'd go away again. (Then again, by
> > then,
> > modules are hopefully commonly available. cl.exe doesn't have an
> > /arch:avx512 switch yet, so this is probably several years away
> > from
> > happening.)
> >
> > Comments? Is it feasible to require that people who want to use
> > avx512
> > include a new header instead of immintrin.h? Else, does anyone have
> > a better
> > idea other than reintroducing the #ifdefs, augmented with the
> > module check?
> >
> > Thanks,
> > Nico
> >
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory