[cfe-dev] [PROPOSAL] Reintroduce guards for Intel intrinsic headers
Richard Smith
richard at metafoo.co.uk
Mon Aug 3 15:30:40 PDT 2015
On Mon, Aug 3, 2015 at 1:37 PM, Hans Wennborg <hans at chromium.org> wrote:
> On Sun, Aug 2, 2015 at 5:09 PM, Chandler Carruth <chandlerc at google.com>
> wrote:
> > Would cherrypicking the diagnostics to the 3.7 branch be better or worse?
> > (I'm of two minds, curious what others think...)
>
> The alternative of reverting would have the downside of missing out on
> some of the target attribute functionality in 3.7. I haven't been
> following closely enough to determine how great a loss that would be,
> but as far as I understand, this is still a work in progress, right?
>
> The alternative of cherry-picking diagnostics has the problem that I
> don't think any diagnostics have landed yet :-)
>
> My inclination is to revert back to safety here. The revert would be
> pretty hairy though, as there have been a number of changes to these
> files since r239883 landed. I'm still trying to figure this out.
Here's my view: sometimes features don't make the completeness bar in time
for a release. That's normal and not something to be apologetic / troubled
about; we'll get the feature in the next release, and we shouldn't try to
rush it in. This feature is unusual in that the transitional period has
temporarily left us with (arguably) a regression, and that happened to be
the state when the 3.7 branch was cut, but our normal philosophies should
apply: we have time-based releases, not feature-based ones, and our first
response to regressions is to revert to green. So I think reverting r239883
is the way to go. If the diagnostic work is done in time, we can consider
unreverting and patching it across.
> On Sun, Aug 2, 2015 at 12:00 PM Justin Bogner <mail at justinbogner.com>
> wrote:
> >>
> >> Eric Christopher <echristo at gmail.com> writes:
> >> > On Thu, Jul 30, 2015 at 10:12 AM Reid Kleckner <rnk at google.com>
> wrote:
> >> >
> >> > I'm opposed to this. Going forward, I would really like target
> >> > intrinsics
> >> > to be available regardless of the current feature set, so users
> >> > don't need
> >> > hacks like these.
> >> >
> >> > Agreed.
> >> >
> >> >
> >> > I see two ways to do this with different tradeoffs:
> >> > 1. Diagnose missing target attributes when calling the intel
> >> > intrinsics. I
> >> > was surprised to find that we don't already do this.
> >> >
> >> > Sorry. This is on my list of things to do.
> >>
> >> +hans
> >>
> >> I agree with the direction of moving to use target attributes instead of
> >> relying on flaky ifdefs, but without any errors or warnings here this is
> >> a pretty serious diagnostic regression.
> >>
> >> I think we should revert this on the 3.7 branch. It can stay as is on
> >> trunk assuming the diagnostics are coming soon.
> >>
> >> Right now we end up in spaces where we get crashes in the backend
> >> instead of a sensible error in far too many situations. Notably:
> >>
> >> https://llvm.org/bugs/show_bug.cgi?id=24125
> >> https://llvm.org/bugs/show_bug.cgi?id=24087
> >> https://llvm.org/bugs/show_bug.cgi?id=24335
> >>
> >> Additionally, I'm told this causes issues with configure scripts
> >> misdetecting available features, as well as strange compatibility issues
> >> like the one that led to this thread.
> >>
> >> This feature is woefully incomplete. We need the warnings/errors for it
> >> to be acceptable quality.
> >>
> >> >
> >> > 2. We could support some automatic transfer of the target
> attribute
> >> > to the
> >> > caller when calling these intrinsics, but I worry that this is too
> >> > confusing.
> >> >
> >> > We could, but it's probably better to leave it as is.
> >> >
> >> > -eric
> >> >
> >> >
> >> > Implicitly setting a target attribute may block inlining that the
> >> > user
> >> > expected to happen, for example. Alternatively, there may be a
> >> > dynamic
> >> > cpuid check in the same function between SSE2 and AVX variants of
> >> > the same
> >> > algorithm, and now the SSE2 loop will unexpectedly use AVX
> >> > instructions.
> >> >
> >> > So we should probably settle with telling the user to add -msseNN
> or
> >> > __atribute__((target(("sseNN")))).
> >> >
> >> > On Thu, Jul 30, 2015 at 8:53 AM, Vedant Kumar <vsk at apple.com>
> wrote:
> >> >
> >> > I've run into some code which no longer compiles because of
> two
> >> > recent
> >> > changes:
> >> >
> >> > 41885d3 Update the intel intrinsic headers to use the target
> >> > attribute support.
> >> > 695aff1 Use a define for per-file function attributes for
> the
> >> > Intel
> >> > intrinsic headers.
> >> >
> >> > Specifically, one project defines its own SSE4.1 emulation
> >> > routines
> >> > when the real intrinsics aren't available. This is a problem
> >> > because
> >> > they've reused the names of the intrinsics. E.g;
> >> >
> >> > > #ifndef __SSE4_1__
> >> > > #define _mm_extract_epi8(a_, ndx) ({ ... })
> >> > > static inline __m128i _mm_blendv_epi8(__m128i a, __m128i b,
> >> > __m128i
> >> > mask) { ... }
> >> > > ...
> >> > > #endif
> >> >
> >> > SSE4.1 intrinsics now leak into the project when it's being
> >> > compiled
> >> > for targets without SSE4.1 support. Compilation fails with
> >> > "error:
> >> > redefinition ...".
> >> >
> >> > When these changes were initially being discussed, I think our
> >> > stance
> >> > was that we shouldn't support code like this [1]. However, we
> >> > should
> >> > reconsider for the sake of avoiding breakage. AFAICT, we would
> >> > need to
> >> > revert just two types of changes:
> >> >
> >> > In lib/Headers/__wmmintrin_aes.h:
> >> >
> >> > > -#if defined (__SSE4_2__) || defined (__SSE4_1__)
> >> > > #include <smmintrin.h>
> >> > > -#endif
> >> >
> >> > In lib/Headers/smmintrin.h:
> >> >
> >> > > -#ifndef __SSE4_1__
> >> > > -#error "SSE4.1 instruction set not enabled"
> >> > > -#else
> >> >
> >> > I don't see any downsides to reintroducing these guards. If
> >> > everyone's
> >> > OK with this, I can mail a patch in. The alternative is to
> have
> >> > clients rewrite their emulation layers like this:
> >> >
> >> > > #ifdef __SSE4_1__
> >> > > #define compat_mm_extract_epi8 _mm_extract_epi8
> >> > > static inline __m128i combat_mm_blendv_epi8(__m128i a,
> __m128i
> >> > b,
> >> > __m128i mask) __attribute__((__target__(("sse4.1")))) {
> >> > > return _mm_blendv_epi8(a, b, mask);
> >> > > }
> >> > > ...
> >> > > #else /* OK, no native SSE 4.1. Define our own. */
> >> > > #define compat_mm_extract_epi8(a_, ndx) ({ ... })
> >> > > static inline __m128i compat_mm_blendv_epi8(__m128i a,
> __m128i
> >> > b,
> >> > __m128i mask) { ... }
> >> > > ...
> >> > > #endif
> >> >
> >> > ... and then replace all calls to intrinsics with calls to the
> >> > new
> >> > compatibility routines. This seems like a lot of tedious work,
> >> > and I'd
> >> > love to help people avoid it :).
> >> >
> >> > Let me know what you think!
> >> >
> >> > vedant
> >> >
> >> > [1] http://lists.cs.uiuc.edu/pipermail/cfe-commits/
> >> > Week-of-Mon-20150615/131192.html
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150803/904a47c3/attachment.html>
More information about the cfe-dev
mailing list