[cfe-dev] [PROPOSAL] Reintroduce guards for Intel intrinsic headers

Hans Wennborg hans at chromium.org
Mon Aug 3 17:42:20 PDT 2015


On Mon, Aug 3, 2015 at 3:40 PM, Eric Christopher <echristo at gmail.com> wrote:
>
>
> On Mon, Aug 3, 2015 at 3:30 PM Richard Smith <richard at metafoo.co.uk> wrote:
>>
>> On Mon, Aug 3, 2015 at 1:37 PM, Hans Wennborg <hans at chromium.org> wrote:
>>>
>>> On Sun, Aug 2, 2015 at 5:09 PM, Chandler Carruth <chandlerc at google.com>
>>> wrote:
>>> > Would cherrypicking the diagnostics to the 3.7 branch be better or
>>> > worse?
>>> > (I'm of two minds, curious what others think...)
>>>
>>> The alternative of reverting would have the downside of missing out on
>>> some of the target attribute functionality in 3.7. I haven't been
>>> following closely enough to determine how great a loss that would be,
>>> but as far as I understand, this is still a work in progress, right?
>>>
>>> The alternative of cherry-picking diagnostics has the problem that I
>>> don't think any diagnostics have landed yet :-)
>>>
>>> My inclination is to revert back to safety here. The revert would be
>>> pretty hairy though, as there have been a number of changes to these
>>> files since r239883 landed. I'm still trying to figure this out.
>>
>>
>> Here's my view: sometimes features don't make the completeness bar in time
>> for a release. That's normal and not something to be apologetic / troubled
>> about; we'll get the feature in the next release, and we shouldn't try to
>> rush it in. This feature is unusual in that the transitional period has
>> temporarily left us with (arguably) a regression, and that happened to be
>> the state when the 3.7 branch was cut, but our normal philosophies should
>> apply: we have time-based releases, not feature-based ones, and our first
>> response to regressions is to revert to green. So I think reverting r239883
>> is the way to go. If the diagnostic work is done in time, we can consider
>> unreverting and patching it across.
>>
>
> That's perfectly ok to me. Working on the diagnostic at the moment.

I've reverted r239883 (and r240720) in r243925. . There were some
conflicts, so I'd appreciate if someone would like to take a look at
it as well.

If the diagnostic patch is straight-forward to merge and resolves all
the user issues we can consider taking that later.

Thanks,
Hans

>>> > On Sun, Aug 2, 2015 at 12:00 PM Justin Bogner <mail at justinbogner.com>
>>> > wrote:
>>> >>
>>> >> Eric Christopher <echristo at gmail.com> writes:
>>> >> > On Thu, Jul 30, 2015 at 10:12 AM Reid Kleckner <rnk at google.com>
>>> >> > wrote:
>>> >> >
>>> >> >     I'm opposed to this. Going forward, I would really like target
>>> >> > intrinsics
>>> >> >     to be available regardless of the current feature set, so users
>>> >> > don't need
>>> >> >     hacks like these.
>>> >> >
>>> >> > Agreed.
>>> >> >
>>> >> >
>>> >> >     I see two ways to do this with different tradeoffs:
>>> >> >     1. Diagnose missing target attributes when calling the intel
>>> >> > intrinsics. I
>>> >> >     was surprised to find that we don't already do this.
>>> >> >
>>> >> > Sorry. This is on my list of things to do.
>>> >>
>>> >> +hans
>>> >>
>>> >> I agree with the direction of moving to use target attributes instead
>>> >> of
>>> >> relying on flaky ifdefs, but without any errors or warnings here this
>>> >> is
>>> >> a pretty serious diagnostic regression.
>>> >>
>>> >> I think we should revert this on the 3.7 branch. It can stay as is on
>>> >> trunk assuming the diagnostics are coming soon.
>>> >>
>>> >> Right now we end up in spaces where we get crashes in the backend
>>> >> instead of a sensible error in far too many situations. Notably:
>>> >>
>>> >>   https://llvm.org/bugs/show_bug.cgi?id=24125
>>> >>   https://llvm.org/bugs/show_bug.cgi?id=24087
>>> >>   https://llvm.org/bugs/show_bug.cgi?id=24335
>>> >>
>>> >> Additionally, I'm told this causes issues with configure scripts
>>> >> misdetecting available features, as well as strange compatibility
>>> >> issues
>>> >> like the one that led to this thread.
>>> >>
>>> >> This feature is woefully incomplete. We need the warnings/errors for
>>> >> it
>>> >> to be acceptable quality.
>>> >>
>>> >> >
>>> >> >     2. We could support some automatic transfer of the target
>>> >> > attribute
>>> >> > to the
>>> >> >     caller when calling these intrinsics, but I worry that this is
>>> >> > too
>>> >> >     confusing.
>>> >> >
>>> >> > We could, but it's probably better to leave it as is.
>>> >> >
>>> >> > -eric
>>> >> >
>>> >> >
>>> >> >     Implicitly setting a target attribute may block inlining that
>>> >> > the
>>> >> > user
>>> >> >     expected to happen, for example. Alternatively, there may be a
>>> >> > dynamic
>>> >> >     cpuid check in the same function between SSE2 and AVX variants
>>> >> > of
>>> >> > the same
>>> >> >     algorithm, and now the SSE2 loop will unexpectedly use AVX
>>> >> > instructions.
>>> >> >
>>> >> >     So we should probably settle with telling the user to add
>>> >> > -msseNN or
>>> >> >     __atribute__((target(("sseNN")))).
>>> >> >
>>> >> >     On Thu, Jul 30, 2015 at 8:53 AM, Vedant Kumar <vsk at apple.com>
>>> >> > wrote:
>>> >> >
>>> >> >         I've run into some code which no longer compiles because of
>>> >> > two
>>> >> > recent
>>> >> >         changes:
>>> >> >
>>> >> >           41885d3 Update the intel intrinsic headers to use the
>>> >> > target
>>> >> >         attribute support.
>>> >> >           695aff1 Use a define for per-file function attributes for
>>> >> > the
>>> >> > Intel
>>> >> >         intrinsic headers.
>>> >> >
>>> >> >         Specifically, one project defines its own SSE4.1 emulation
>>> >> > routines
>>> >> >         when the real intrinsics aren't available. This is a problem
>>> >> > because
>>> >> >         they've reused the names of the intrinsics. E.g;
>>> >> >
>>> >> >         > #ifndef __SSE4_1__
>>> >> >         > #define _mm_extract_epi8(a_, ndx) ({ ... })
>>> >> >         > static inline __m128i _mm_blendv_epi8(__m128i a, __m128i
>>> >> > b,
>>> >> > __m128i
>>> >> >         mask) { ... }
>>> >> >         > ...
>>> >> >         > #endif
>>> >> >
>>> >> >         SSE4.1 intrinsics now leak into the project when it's being
>>> >> > compiled
>>> >> >         for targets without SSE4.1 support. Compilation fails with
>>> >> > "error:
>>> >> >         redefinition ...".
>>> >> >
>>> >> >         When these changes were initially being discussed, I think
>>> >> > our
>>> >> > stance
>>> >> >         was that we shouldn't support code like this [1]. However,
>>> >> > we
>>> >> > should
>>> >> >         reconsider for the sake of avoiding breakage. AFAICT, we
>>> >> > would
>>> >> > need to
>>> >> >         revert just two types of changes:
>>> >> >
>>> >> >         In lib/Headers/__wmmintrin_aes.h:
>>> >> >
>>> >> >         > -#if defined (__SSE4_2__) || defined (__SSE4_1__)
>>> >> >         >  #include <smmintrin.h>
>>> >> >         > -#endif
>>> >> >
>>> >> >         In lib/Headers/smmintrin.h:
>>> >> >
>>> >> >         > -#ifndef __SSE4_1__
>>> >> >         > -#error "SSE4.1 instruction set not enabled"
>>> >> >         > -#else
>>> >> >
>>> >> >         I don't see any downsides to reintroducing these guards. If
>>> >> > everyone's
>>> >> >         OK with this, I can mail a patch in. The alternative is to
>>> >> > have
>>> >> >         clients rewrite their emulation layers like this:
>>> >> >
>>> >> >         > #ifdef __SSE4_1__
>>> >> >         > #define compat_mm_extract_epi8 _mm_extract_epi8
>>> >> >         > static inline __m128i combat_mm_blendv_epi8(__m128i a,
>>> >> > __m128i
>>> >> > b,
>>> >> >         __m128i mask) __attribute__((__target__(("sse4.1")))) {
>>> >> >         >   return _mm_blendv_epi8(a, b, mask);
>>> >> >         > }
>>> >> >         > ...
>>> >> >         > #else /* OK, no native SSE 4.1. Define our own. */
>>> >> >         > #define compat_mm_extract_epi8(a_, ndx) ({ ... })
>>> >> >         > static inline __m128i compat_mm_blendv_epi8(__m128i a,
>>> >> > __m128i
>>> >> > b,
>>> >> >         __m128i mask) { ... }
>>> >> >         > ...
>>> >> >         > #endif
>>> >> >
>>> >> >         ... and then replace all calls to intrinsics with calls to
>>> >> > the
>>> >> > new
>>> >> >         compatibility routines. This seems like a lot of tedious
>>> >> > work,
>>> >> > and I'd
>>> >> >         love to help people avoid it :).
>>> >> >
>>> >> >         Let me know what you think!
>>> >> >
>>> >> >         vedant
>>> >> >
>>> >> >         [1] http://lists.cs.uiuc.edu/pipermail/cfe-commits/
>>> >> >         Week-of-Mon-20150615/131192.html
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev



More information about the cfe-dev mailing list