[cfe-dev] The intrinsics headers (especially avx512) are too big. What to do about it?

Eric Christopher via cfe-dev cfe-dev at lists.llvm.org
Mon May 16 11:30:48 PDT 2016


On Mon, May 16, 2016 at 10:02 AM Nico Weber <thakis at chromium.org> wrote:

> On Mon, May 16, 2016 at 12:48 PM, Eric Christopher <echristo at gmail.com>
> wrote:
>
>>
>>
>> On Mon, May 16, 2016 at 9:43 AM Nico Weber via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>>
>>> > Sorry if this is a stupid question, but do the windows intrinsic
>>> headers actually contain the same contents as clang's?
>>>
>>> As far as I can tell (from looking at
>>> https://msdn.microsoft.com/en-us/library/hh977023.aspx and comparing to
>>> clang's headers), yes. MSVC doesn't have the avx512 intrinsics yet, but
>>> that's probably only because they're new.
>>>
>>> I had hoped that I could not include all of x86intrin.h in intrin.h,
>>> but that page says "The intrin.h header includes both immintrin.h and
>>> ammintrin.h for simplicity."
>>>
>>> > This old discussion may cover some of this as well?
>>>
>>> Ah thanks, yes, sounds like there are reasons for not putting the
>>> includes back behind ifdefs. The thread doesn't really mention the reasons,
>>> and since clang doesn't implement full multiversioning yet I'm unable to
>>> guess at the reasons -- but it sounds like people don't want to re-add the
>>> arch ifdefs. Ok, I'll send a patch to add them back ifdef _MSC_VER only --
>>> there should be no drawback to that, and it stops the bleeding in the case
>>> where it's worst (with Microsoft headers).
>>>
>>>
>> It implements enough multiversioning to make it worthwhile to have them,
>> I don't know what you're confused about here.
>>
>
> I didn't mean to question this point, I just don't understand it. Can you
> give an example where it's useful? I'm sure there is one, I just can't
> think of one.
>

Sure, the programmer has to write their own dispatch, but it'll allow you
to include variously target optimized versions of the same function in the
same file.

The equivalent linux side of things is:

void my_avx_function() __attribute__((__target__("avx")))
void my_nonavx_function()

...

if (__builtin_cpu_supports("avx"))
   my_avx_function()
else
   my_nonavx_function()

and you can keep both implementations in the same file and don't have to
worry about things like command line options being different and causing
all sorts of haywire.


>
>
>> Why do we want to make this platform specific? If you wanted to match
>> MSVC I guess you could just turn them off for windows as an alternate
>> solution?
>>
>
> Yes, that's what I meant with the _MSC_VER check.
>

I meant just turn off the avx512 headers, not sure if that's what you
meant. We should probably look at the lexing and parsing code to see what
can be sped up here.

-eric


>
>
>>
>> -eric
>>
>>
>>> Going forward, we'll have to teach clang more about at least some
>>> intrinsics for `#pragma intrin` (PR19898), which might end up helping for
>>> this too.
>>>
>>> I also reached out to STL at Microsoft, he said he'll try to look into
>>> including an "intrin0.h" header in the next major version of MSVC which
>>> would only declare a small set of intrinsics instead of all of them (no
>>> promises, of course).
>>>
>>> People working on avx512, I'd be curious to hear your perspective on
>>> this, as well as your reply to Chandler's points.
>>>
>>> Thanks,
>>> Nico
>>>
>>> On Sat, May 14, 2016 at 2:04 AM, Chandler Carruth via cfe-dev <
>>> cfe-dev at lists.llvm.org> wrote:
>>>
>>>> A couple of points:
>>>>
>>>> 1) Definitely agree with Hal that these intrinsics really shouldn't be
>>>> mapping to builtins. This is something I'm pretty frustrated about the
>>>> direction of AVX-512 support in Clang and LLVM. We really need generic
>>>> vector IR to lower cleanly into these instructions.
>>>>
>>>> 2) Reid, you specifically advocated for not having the set of
>>>> intrinsics available based on particular feature sets. ;] But I agree there
>>>> seems to be a scalability problem here.
>>>>
>>>> 3) I think a lot of the scalability problem is that very basic,
>>>> non-vector code patterns, require Intrin.h on Windows and pull in *ALL* the
>>>> vector intrinsics. =/ It'd be really good to try to fix *that*.
>>>>
>>>> 4) AVX-512 has made this *incredibly* worse than any previous ISA
>>>> extension. It used to be we had the product of (operation * operand-type)
>>>> intrinsics. This is already pretty bad. Now we have (operation *
>>>> operand-type * 4) because we have 4 masking variants. So it seems Intel has
>>>> just made a really unfortunate API choice by forcing every permutation of
>>>> these things to get a different name and thus a different intrinsic in a
>>>>  header file. =/ And sadly that too is probably too late to walk back.
>>>>
>>>>
>>>> I wonder if we could at least initially address this by providing very
>>>> limited "builtin" modules for truly builtin headers that don't touch any
>>>> system headers, and actually *always* use the modules approach for these
>>>> headers, right out of the box.
>>>>
>>>> On Fri, May 13, 2016 at 7:32 PM C Bergström <cfe-dev at lists.llvm.org>
>>>> wrote:
>>>>
>>>>> This old discussion may cover some of this as well? I also thought I
>>>>> remember something more recent around this..
>>>>>
>>>>> http://clang-developers.42468.n3.nabble.com/PROPOSAL-Reintroduce-guards-for-Intel-intrinsic-headers-td4046979.html
>>>>>
>>>>> On Sat, May 14, 2016 at 8:59 AM, Sean Silva via cfe-dev
>>>>> <cfe-dev at lists.llvm.org> wrote:
>>>>> > Sorry if this is a stupid question, but do the windows intrinsic
>>>>> headers
>>>>> > actually contain the same contents as clang's? (e.g. maybe the
>>>>> windows ones
>>>>> > don't cover all the ISA's that clang's do).
>>>>> >
>>>>> > -- Sean Silva
>>>>> >
>>>>> > On Thu, May 12, 2016 at 9:16 AM, Nico Weber via cfe-dev
>>>>> > <cfe-dev at lists.llvm.org> wrote:
>>>>> >>
>>>>> >> Hi,
>>>>> >>
>>>>> >> on Windows, C++ system headers like e.g. <string> end up pulling in
>>>>> >> intrin.h. clang's intrinsic headers are very large.
>>>>> >>
>>>>> >> If you take a cc file containing just `#include <string>` and run
>>>>> that
>>>>> >> through the preprocessor with `cl /P test.cc` and `clang-cl /P
>>>>> test.cc`, the
>>>>> >> test.I file generated by clang-cl is 1.7MB while the one created by
>>>>> cl.exe
>>>>> >> is 0.7MB. This is solely due to clang's intrin.h expanding to way
>>>>> more
>>>>> >> stuff.
>>>>> >>
>>>>> >> The biggest offenders are avx512vlintrin.h, avx512fintrin.h,
>>>>> >> avx512vlbwintrin.h which add up to 657kB already. Before r239883,
>>>>> we only
>>>>> >> included avx headers if __AVX512F__ etc was defined. This is
>>>>> currently never
>>>>> >> the case in practice. Later (r243394 r243402 r243406 and more), the
>>>>> avx
>>>>> >> headers got much bigger.
>>>>> >>
>>>>> >> Parsing all this code takes time -- removing the avx512 includes
>>>>> from
>>>>> >> immintrin.h locally makes compiling a file containing just the
>>>>> <string>
>>>>> >> header 0.25s faster (!), and building all of v8 gets 6% faster,
>>>>> just from
>>>>> >> not including the avx512 headers.
>>>>> >>
>>>>> >> What can we do about this? Since avx512 is new, maybe they could be
>>>>> not
>>>>> >> part of immintrin.h? Or we could re-introduce
>>>>> >>
>>>>> >>   #if !__has_feature(modules) && defined(__AVX512BW__)
>>>>> >>
>>>>> >> include guards in immintrin.h. This would give us a speed win
>>>>> immediately
>>>>> >> without drawbacks as far as I can see, but in a few years when
>>>>> people start
>>>>> >> compiling with /arch:avx512 that'd go away again. (Then again, by
>>>>> then,
>>>>> >> modules are hopefully commonly available. cl.exe doesn't have an
>>>>> >> /arch:avx512 switch yet, so this is probably several years away from
>>>>> >> happening.)
>>>>> >>
>>>>> >> Comments? Is it feasible to require that people who want to use
>>>>> avx512
>>>>> >> include a new header instead of immintrin.h? Else, does anyone have
>>>>> a better
>>>>> >> idea other than reintroducing the #ifdefs, augmented with the
>>>>> module check?
>>>>> >>
>>>>> >> Thanks,
>>>>> >> Nico
>>>>> >>
>>>>> >> _______________________________________________
>>>>> >> cfe-dev mailing list
>>>>> >> cfe-dev at lists.llvm.org
>>>>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>> >>
>>>>> >
>>>>> >
>>>>> > _______________________________________________
>>>>> > cfe-dev mailing list
>>>>> > cfe-dev at lists.llvm.org
>>>>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>> >
>>>>> _______________________________________________
>>>>> cfe-dev mailing list
>>>>> cfe-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>
>>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160516/4741c255/attachment.html>


More information about the cfe-dev mailing list