[cfe-dev] The intrinsics headers (especially avx512) are too big. What to do about it?

Demikhovsky, Elena via cfe-dev cfe-dev at lists.llvm.org
Tue Jun 14 06:42:57 PDT 2016


We are still trying to find a suitable solution.
Keeping declarations only inside header files will save compile time.
In this case the implementation will be hidden inside clang.
Can somebody help me to estimate impact and complexity of this solution?

Thank you.

-           Elena

From: thakis at google.com [mailto:thakis at google.com] On Behalf Of Nico Weber
Sent: Tuesday, June 14, 2016 15:50
To: Demikhovsky, Elena <elena.demikhovsky at intel.com>
Cc: Hal Finkel <hfinkel at anl.gov>; Reid Kleckner <rnk at google.com>; cfe-dev <cfe-dev at lists.llvm.org>; Badouh, Asaf <asaf.badouh at intel.com>; Zuckerman, Michael <michael.zuckerman at intel.com>; David Majnemer <majnemer at google.com>; Chandler Carruth (chandlerc at google.com) <chandlerc at google.com>
Subject: Re: [cfe-dev] The intrinsics headers (especially avx512) are too big. What to do about it?

On Tue, May 17, 2016 at 3:49 PM, Demikhovsky, Elena <elena.demikhovsky at intel.com<mailto:elena.demikhovsky at intel.com>> wrote:
   >Indeed. It is not clear to me, however, that this situation is desirable. We
   >had a general policy that our intrinsics headers should generate generic IR
   >whenever possible, and if we've strayed from that, we should discuss that
   >first.

Let's take a look at this intrinsic:

static __inline__ __m512i __DEFAULT_FN_ATTRS
_mm512_mask_add_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)
{
  return (__m512i) __builtin_ia32_paddq512_mask ((__v8di) __A,
             (__v8di) __B,
             (__v8di) __W,
             (__mmask8) __U);
}

The IR that should be generated:
%C = add <8 x double> %B, %A
%res = select <8 x i1> %mask, <8 x double> %C, %W

If we parse __builtin_ia32_paddq512_mask in CGBuiltin.cpp and generate IR there, will it help?

(Please do not consider my question as a general Intel solution. I just want to understand the problem.)

The bit I care most about is that adding `#include <intrin.h>` shouldn't add megabytes of stuff to my translation unit.

Hve you discussed making immintrin.h more modular? It looks like many more avx512 builtins keep landing, making this problem bigger and bigger. It'd be good if I only had to pay for this if I explicitly included an avx512.h, and even then it'd be nice if that wasn't one huge header, but several smaller ones, so I only have to pay compile time for the bits I need.

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160614/f3cc47fa/attachment.html>


More information about the cfe-dev mailing list