[cfe-dev] The intrinsics headers (especially avx512) are too big. What to do about it?

Tue Jun 14 05:50:16 PDT 2016

On Tue, May 17, 2016 at 3:49 PM, Demikhovsky, Elena <
elena.demikhovsky at intel.com> wrote:

>    >Indeed. It is not clear to me, however, that this situation is
> desirable. We
>    >had a general policy that our intrinsics headers should generate
> generic IR
>    >whenever possible, and if we've strayed from that, we should discuss
> that
>    >first.
>
> Let's take a look at this intrinsic:
>
> static __inline__ __m512i __DEFAULT_FN_ATTRS
> _mm512_mask_add_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
> {
>   return (__m512i) __builtin_ia32_paddq512_mask ((__v8di) __A,
>              (__v8di) __B,
>              (__v8di) __W,
>              (__mmask8) __U);
> }
>
> The IR that should be generated:
> %C = add <8 x double> %B, %A
> %res = select <8 x i1> %mask, <8 x double> %C, %W
>
> If we parse __builtin_ia32_paddq512_mask in CGBuiltin.cpp and generate IR
> there, will it help?
>
> (Please do not consider my question as a general Intel solution. I just
> want to understand the problem.)
>

The bit I care most about is that adding `#include <intrin.h>` shouldn't
add megabytes of stuff to my translation unit.

Hve you discussed making immintrin.h more modular? It looks like many more
avx512 builtins keep landing, making this problem bigger and bigger. It'd
be good if I only had to pay for this if I explicitly included an avx512.h,
and even then it'd be nice if that wasn't one huge header, but several
smaller ones, so I only have to pay compile time for the bits I need.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160614/e26c7bd9/attachment.html>