[cfe-commits] [PATCH] Fixing include order for Windows

Sun Feb 19 19:02:39 PST 2012

On Sun, Feb 19, 2012 at 8:48 PM, Aaron Ballman <aaron at aaronballman.com> wrote:
> On Sun, Feb 19, 2012 at 7:41 PM, Chandler Carruth <chandlerc at google.com> wrote:
>> On Sun, Feb 19, 2012 at 5:35 PM, Aaron Ballman <aaron at aaronballman.com>
>> wrote:
>>>
>>> On Sun, Feb 19, 2012 at 5:18 PM, Chandler Carruth <chandlerc at google.com>
>>> wrote:
>>> > On Sun, Feb 19, 2012 at 3:09 PM, Aaron Ballman <aaron at aaronballman.com>
>>> > wrote:
>>> >>
>>> >> On Sun, Feb 19, 2012 at 4:45 PM, Eli Friedman <eli.friedman at gmail.com>
>>> >> wrote:
>>> >> > On Sun, Feb 19, 2012 at 10:28 AM, Aaron Ballman
>>> >> > <aaron at aaronballman.com>
>>> >> > wrote:
>>> >> >> On Windows, the default search paths for includes would put the
>>> >> >> CMake-found headers (<build dir>\lib\clang\3.1\include) before the
>>> >> >> OS
>>> >> >> headers.  For the most part, this is fine.  However, there are some
>>> >> >> header files that share names with the system headers, which are
>>> >> >> incorrect.  For instance, immintrin.h is completely different from
>>> >> >> the
>>> >> >> SDK include.  This causes miscompilations in surprising places
>>> >> >> because
>>> >> >> a system header will attempt to include one of these duplicates, and
>>> >> >> find the clang header instead of the other system header.
>>> >> >>
>>> >> >> A simple test case of: #include <map> when building from MSVC
>>> >> >> demonstrates the issue.
>>> >> >>
>>> >> >> A user can work around this behavior by specifying -nobuiltininc on
>>> >> >> the command line, but that seems rather obtuse for such normal use
>>> >> >> cases.
>>> >> >>
>>> >> >> I've fixed this by changing the order that the headers are included
>>> >> >> from the toolchain.  The SDK headers get added *before* the clang
>>> >> >> headers, and this resolves the issue.
>>> >> >>
>>> >> >> This patch does cause one regression test to fail
>>> >> >> (Modules\compiler_builtins.m).  This is because the platform SDK
>>> >> >> intrinsics are differently structured from the ones that come with
>>> >> >> clang.  The OS APIs expect the SDK intrinsics, but our tests seem to
>>> >> >> require our own intrinsics.  I'm open to suggestions.
>>> >> >
>>> >> > If on Windows, we specifically need the OS version of immintrin.h, we
>>> >> > should modify our version to forward to it; completely ignoring the
>>> >> > builtin headers is a bad idea (for example,
>>> >> > http://llvm.org/bugs/show_bug.cgi?id=11918 is a miscompile caused by
>>> >> > pulling in the wrong version of stdarg.h).
>>> >>
>>> >> The patch doesn't completely ignore the builtin headers though -- it
>>> >> simply prefers the OS headers over the builtin headers.
>>> >
>>> >
>>> > That doesn't work in the general case though -- look at all the places
>>> > in
>>> > our builtin headers where we specifically forward *to* the system
>>> > headers.
>>> > If the builtin doesn't come first, lots of things go wrong here.
>>> >
>>> > We should make our builtins always come first, and always be correct
>>> > when
>>> > they come first.
>>> >
>>> >> That being
>>> >> said, I am not against forwarding our version.  What would be a good
>>> >> approach to it?
>>> >
>>> >
>>> > Look at the other builtin headers that use '__has_include_next' and
>>> > 'include_next' to fold together a Clang builtin header and a system
>>> > builtin
>>> > header.
>>> >
>>> > I suspect you will have to essentially tack on the system 'immintrin.h'
>>> > header to the end of the builtin, as we do expect #include <immintrin.h>
>>> > to
>>> > pull in the Clang builtin bits... Essentially, this doesn't seem like it
>>> > should be either-or, it should be both.
>>>
>>> So how should I handle the duplicate declarations?  For instance:
>>>
>>> // mmintrin.h from MSVC
>>> typedef union __declspec(intrin_type) _CRT_ALIGN(8) __m64
>>> {
>>>    unsigned __int64    m64_u64;
>>>    float               m64_f32[2];
>>>    __int8              m64_i8[8];
>>>    __int16             m64_i16[4];
>>>    __int32             m64_i32[2];
>>>    __int64             m64_i64;
>>>    unsigned __int8     m64_u8[8];
>>>    unsigned __int16    m64_u16[4];
>>>    unsigned __int32    m64_u32[2];
>>> } __m64;
>>>
>>> void  _m_empty(void);
>>> #define _mm_empty         _m_empty
>>>
>>> // mmintrin.h from clang
>>> typedef long long __m64 __attribute__((__vector_size__(8)));
>>>
>>> static __inline__ void __attribute__((__always_inline__, __nodebug__))
>>> _mm_empty(void)
>>> {
>>>    __builtin_ia32_emms();
>>> }
>>> #define _m_empty _mm_empty
>>>
>>> It seems to me that trying to keep *both* headers is going to be
>>> rather challenging given that both make the same typedefs in
>>> incompatible ways, the declarations/macros are opposite one another,
>>> etc.
>>
>>
>> Hold on -- the MS one is trying to provide the same intel intrinsics? Now I
>> think that the original search order and header file is correct -- we don't
>> want the MS version of this header as it is likely to use MS extensions
>> Clang doesn't support (or for which Clang has better models). Maybe what we
>> really need is to understand what the clang immintrin.h *lacks* in order for
>> MS code to use it?
>
> Yes, MS has the same intrinsics, but with differently-structured
> header files.  The MS immintrin.h has all of the AVX intrinsics and
> includes only wmmintrin.h, which includes nmmintrin.h for SSE4.2, etc
> The clang immintrin.h checks various defines (__AVX__, __AES__, etc)
> before including all of the files.
>
> One thing which does get us further in parsing is to force define the
> includes from clang's intrinsics:
>
> #define __AVX__
> #define __AES__
> #define __SSE3__
> #define __SSSE3__
> #define __SSE4_1__
> #define __SSE4_2__
>
> There are still compile errors, however, because of the way
> Microsoft's intrin.h header file works.  It has some macro hacks to
> basically erase some declarations.  For instance:
>
> #define __MACHINE(X)          X;
> #define __MACHINEX86X_IA64    __MACHINE
> ...
> #define __MACHINEZ(X)         /* NOTHING */
> ...
> #if !(defined(_M_IX86) || defined(_M_IA64))
> #undef __MACHINEX86X_IA64
> #define __MACHINEX86X_IA64    __MACHINEZ
> #endif
> ...
> __MACHINEX86X_IA64(__m64 _m_pshufw(__m64,int))
>
> // Microsoft headers
> extern __m64 _m_pshufw(__m64, int);
> #define _mm_shuffle_pi16  _m_pshufw
>
> // Clang headers
> #define _mm_shuffle_pi16(a, n) __extension__ ({ \
>  __m64 __a = (a); \
>  (__m64)__builtin_ia32_pshufw((__v4hi)__a, (n)); })
> #define _m_pshufw _mm_shuffle_pi16
>
> This causes compile errors with clang like this:
>
> error: expected unqualified-id
> __MACHINEX86X_IA64(__m64 _m_pshufw(__m64,int))
>
> Because this expands out into:
>
> __m64 __extension__ ({ __m64 __a = (__m64);
> (__m64)__builtin_ia32_pshufw((__v4hi)__a, (int)); });

I wonder if it would make sense to find all of the macro declarations
in the intrinsics and turn them into something like this:

#if defined(_MSC_VER)
static __inline__ void __attribute__((__always_inline__, __nodebug__))
_mm_shuffle_pi16(__m64 a, int n) {
  return __buildin_ia32_pshufw((__v4hi)a, n);
}
#else
#define _mm_shuffle_pi16(a, n) __extension__ ({ \
  __m64 __a = (a); \
  (__m64)__builtin_ia32_pshufw((__v4hi)__a, (n)); })
#endif

There appear to only be a dozen or so cases of this.  But I'm not
certain how much customization we'd like in these header files,
either.

~Aaron