<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Tue, May 17, 2016 at 3:49 PM, Demikhovsky, Elena <span dir="ltr"><<a href="mailto:elena.demikhovsky@intel.com" target="_blank">elena.demikhovsky@intel.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">   >Indeed. It is not clear to me, however, that this situation is desirable. We<br>

   >had a general policy that our intrinsics headers should generate generic IR<br>

   >whenever possible, and if we've strayed from that, we should discuss that<br>

   >first.<br>

<br>

</span>Let's take a look at this intrinsic:<br>

<br>

static __inline__ __m512i __DEFAULT_FN_ATTRS<br>

_mm512_mask_add_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B)<br>

{<br>

  return (__m512i) __builtin_ia32_paddq512_mask ((__v8di) __A,<br>

             (__v8di) __B,<br>

             (__v8di) __W,<br>

             (__mmask8) __U);<br>

}<br>

<br>

The IR that should be generated:<br>

%C = add <8 x double> %B, %A<br>

%res = select <8 x i1> %mask, <8 x double> %C, %W<br>

<br>

If we parse __builtin_ia32_paddq512_mask in CGBuiltin.cpp and generate IR there, will it help?<br>

<br>

(Please do not consider my question as a general Intel solution. I just want to understand the problem.)<br></blockquote><div><br>The bit I care most about is that adding `#include <intrin.h>` shouldn't add megabytes of stuff to my translation unit.</div><div><br></div><div>Hve you discussed making immintrin.h more modular? It looks like many more avx512 builtins keep landing, making this problem bigger and bigger. It'd be good if I only had to pay for this if I explicitly included an avx512.h, and even then it'd be nice if that wasn't one huge header, but several smaller ones, so I only have to pay compile time for the bits I need.</div></div><br></div></div>