[cfe-dev] The intrinsics headers (especially avx512) are too big. What to do about it?

Nico Weber via cfe-dev cfe-dev at lists.llvm.org
Mon May 16 11:49:21 PDT 2016


On Mon, May 16, 2016 at 2:30 PM, Eric Christopher <echristo at gmail.com>
wrote:

>
>
> On Mon, May 16, 2016 at 10:02 AM Nico Weber <thakis at chromium.org> wrote:
>
>> On Mon, May 16, 2016 at 12:48 PM, Eric Christopher <echristo at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Mon, May 16, 2016 at 9:43 AM Nico Weber via cfe-dev <
>>> cfe-dev at lists.llvm.org> wrote:
>>>
>>>> > Sorry if this is a stupid question, but do the windows intrinsic
>>>> headers actually contain the same contents as clang's?
>>>>
>>>> As far as I can tell (from looking at
>>>> https://msdn.microsoft.com/en-us/library/hh977023.aspx and comparing
>>>> to clang's headers), yes. MSVC doesn't have the avx512 intrinsics yet, but
>>>> that's probably only because they're new.
>>>>
>>>> I had hoped that I could not include all of x86intrin.h in intrin.h,
>>>> but that page says "The intrin.h header includes both immintrin.h and
>>>> ammintrin.h for simplicity."
>>>>
>>>> > This old discussion may cover some of this as well?
>>>>
>>>> Ah thanks, yes, sounds like there are reasons for not putting the
>>>> includes back behind ifdefs. The thread doesn't really mention the reasons,
>>>> and since clang doesn't implement full multiversioning yet I'm unable to
>>>> guess at the reasons -- but it sounds like people don't want to re-add the
>>>> arch ifdefs. Ok, I'll send a patch to add them back ifdef _MSC_VER only --
>>>> there should be no drawback to that, and it stops the bleeding in the case
>>>> where it's worst (with Microsoft headers).
>>>>
>>>>
>>> It implements enough multiversioning to make it worthwhile to have them,
>>> I don't know what you're confused about here.
>>>
>>
>> I didn't mean to question this point, I just don't understand it. Can you
>> give an example where it's useful? I'm sure there is one, I just can't
>> think of one.
>>
>
> Sure, the programmer has to write their own dispatch, but it'll allow you
> to include variously target optimized versions of the same function in the
> same file.
>
> The equivalent linux side of things is:
>
> void my_avx_function() __attribute__((__target__("avx")))
> void my_nonavx_function()
>
> ...
>
> if (__builtin_cpu_supports("avx"))
>    my_avx_function()
> else
>    my_nonavx_function()
>
> and you can keep both implementations in the same file and don't have to
> worry about things like command line options being different and causing
> all sorts of haywire.
>

Ah, thanks, I didn't know that worked :-) (I played with it a bit and found
PR27779)


>
>
>>
>>
>>> Why do we want to make this platform specific? If you wanted to match
>>> MSVC I guess you could just turn them off for windows as an alternate
>>> solution?
>>>
>>
>> Yes, that's what I meant with the _MSC_VER check.
>>
>
> I meant just turn off the avx512 headers, not sure if that's what you
> meant. We should probably look at the lexing and parsing code to see what
> can be sped up here.
>

I turned it off for all headers for now in r269675. I agree that we should
look at lexing and parsing speed, and also consider things like
tablegen'ing intinsics. Until then, making most compiles faster seems like
a better tradeoff on Windows.

As said above, I'm curious to hear from the people working on avx512 :-)


>
> -eric
>
>
>>
>>
>>>
>>> -eric
>>>
>>>
>>>> Going forward, we'll have to teach clang more about at least some
>>>> intrinsics for `#pragma intrin` (PR19898), which might end up helping for
>>>> this too.
>>>>
>>>> I also reached out to STL at Microsoft, he said he'll try to look into
>>>> including an "intrin0.h" header in the next major version of MSVC which
>>>> would only declare a small set of intrinsics instead of all of them (no
>>>> promises, of course).
>>>>
>>>> People working on avx512, I'd be curious to hear your perspective on
>>>> this, as well as your reply to Chandler's points.
>>>>
>>>> Thanks,
>>>> Nico
>>>>
>>>> On Sat, May 14, 2016 at 2:04 AM, Chandler Carruth via cfe-dev <
>>>> cfe-dev at lists.llvm.org> wrote:
>>>>
>>>>> A couple of points:
>>>>>
>>>>> 1) Definitely agree with Hal that these intrinsics really shouldn't be
>>>>> mapping to builtins. This is something I'm pretty frustrated about the
>>>>> direction of AVX-512 support in Clang and LLVM. We really need generic
>>>>> vector IR to lower cleanly into these instructions.
>>>>>
>>>>> 2) Reid, you specifically advocated for not having the set of
>>>>> intrinsics available based on particular feature sets. ;] But I agree there
>>>>> seems to be a scalability problem here.
>>>>>
>>>>> 3) I think a lot of the scalability problem is that very basic,
>>>>> non-vector code patterns, require Intrin.h on Windows and pull in *ALL* the
>>>>> vector intrinsics. =/ It'd be really good to try to fix *that*.
>>>>>
>>>>> 4) AVX-512 has made this *incredibly* worse than any previous ISA
>>>>> extension. It used to be we had the product of (operation * operand-type)
>>>>> intrinsics. This is already pretty bad. Now we have (operation *
>>>>> operand-type * 4) because we have 4 masking variants. So it seems Intel has
>>>>> just made a really unfortunate API choice by forcing every permutation of
>>>>> these things to get a different name and thus a different intrinsic in a
>>>>>  header file. =/ And sadly that too is probably too late to walk back.
>>>>>
>>>>>
>>>>> I wonder if we could at least initially address this by providing very
>>>>> limited "builtin" modules for truly builtin headers that don't touch any
>>>>> system headers, and actually *always* use the modules approach for these
>>>>> headers, right out of the box.
>>>>>
>>>>> On Fri, May 13, 2016 at 7:32 PM C Bergström <cfe-dev at lists.llvm.org>
>>>>> wrote:
>>>>>
>>>>>> This old discussion may cover some of this as well? I also thought I
>>>>>> remember something more recent around this..
>>>>>>
>>>>>> http://clang-developers.42468.n3.nabble.com/PROPOSAL-Reintroduce-guards-for-Intel-intrinsic-headers-td4046979.html
>>>>>>
>>>>>> On Sat, May 14, 2016 at 8:59 AM, Sean Silva via cfe-dev
>>>>>> <cfe-dev at lists.llvm.org> wrote:
>>>>>> > Sorry if this is a stupid question, but do the windows intrinsic
>>>>>> headers
>>>>>> > actually contain the same contents as clang's? (e.g. maybe the
>>>>>> windows ones
>>>>>> > don't cover all the ISA's that clang's do).
>>>>>> >
>>>>>> > -- Sean Silva
>>>>>> >
>>>>>> > On Thu, May 12, 2016 at 9:16 AM, Nico Weber via cfe-dev
>>>>>> > <cfe-dev at lists.llvm.org> wrote:
>>>>>> >>
>>>>>> >> Hi,
>>>>>> >>
>>>>>> >> on Windows, C++ system headers like e.g. <string> end up pulling in
>>>>>> >> intrin.h. clang's intrinsic headers are very large.
>>>>>> >>
>>>>>> >> If you take a cc file containing just `#include <string>` and run
>>>>>> that
>>>>>> >> through the preprocessor with `cl /P test.cc` and `clang-cl /P
>>>>>> test.cc`, the
>>>>>> >> test.I file generated by clang-cl is 1.7MB while the one created
>>>>>> by cl.exe
>>>>>> >> is 0.7MB. This is solely due to clang's intrin.h expanding to way
>>>>>> more
>>>>>> >> stuff.
>>>>>> >>
>>>>>> >> The biggest offenders are avx512vlintrin.h, avx512fintrin.h,
>>>>>> >> avx512vlbwintrin.h which add up to 657kB already. Before r239883,
>>>>>> we only
>>>>>> >> included avx headers if __AVX512F__ etc was defined. This is
>>>>>> currently never
>>>>>> >> the case in practice. Later (r243394 r243402 r243406 and more),
>>>>>> the avx
>>>>>> >> headers got much bigger.
>>>>>> >>
>>>>>> >> Parsing all this code takes time -- removing the avx512 includes
>>>>>> from
>>>>>> >> immintrin.h locally makes compiling a file containing just the
>>>>>> <string>
>>>>>> >> header 0.25s faster (!), and building all of v8 gets 6% faster,
>>>>>> just from
>>>>>> >> not including the avx512 headers.
>>>>>> >>
>>>>>> >> What can we do about this? Since avx512 is new, maybe they could
>>>>>> be not
>>>>>> >> part of immintrin.h? Or we could re-introduce
>>>>>> >>
>>>>>> >>   #if !__has_feature(modules) && defined(__AVX512BW__)
>>>>>> >>
>>>>>> >> include guards in immintrin.h. This would give us a speed win
>>>>>> immediately
>>>>>> >> without drawbacks as far as I can see, but in a few years when
>>>>>> people start
>>>>>> >> compiling with /arch:avx512 that'd go away again. (Then again, by
>>>>>> then,
>>>>>> >> modules are hopefully commonly available. cl.exe doesn't have an
>>>>>> >> /arch:avx512 switch yet, so this is probably several years away
>>>>>> from
>>>>>> >> happening.)
>>>>>> >>
>>>>>> >> Comments? Is it feasible to require that people who want to use
>>>>>> avx512
>>>>>> >> include a new header instead of immintrin.h? Else, does anyone
>>>>>> have a better
>>>>>> >> idea other than reintroducing the #ifdefs, augmented with the
>>>>>> module check?
>>>>>> >>
>>>>>> >> Thanks,
>>>>>> >> Nico
>>>>>> >>
>>>>>> >> _______________________________________________
>>>>>> >> cfe-dev mailing list
>>>>>> >> cfe-dev at lists.llvm.org
>>>>>> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>> >>
>>>>>> >
>>>>>> >
>>>>>> > _______________________________________________
>>>>>> > cfe-dev mailing list
>>>>>> > cfe-dev at lists.llvm.org
>>>>>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>> >
>>>>>> _______________________________________________
>>>>>> cfe-dev mailing list
>>>>>> cfe-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> cfe-dev mailing list
>>>>> cfe-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>
>>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160516/cc3cdd73/attachment.html>


More information about the cfe-dev mailing list