<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Thu, Jul 30, 2015 at 10:12 AM Reid Kleckner <<a href="mailto:rnk@google.com">rnk@google.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">I'm opposed to this. Going forward, I would really like target intrinsics to be available regardless of the current feature set, so users don't need hacks like these.<div><br></div></div></blockquote><div><br></div><div>Agreed.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div></div><div>I see two ways to do this with different tradeoffs:</div><div>1. Diagnose missing target attributes when calling the intel intrinsics. I was surprised to find that we don't already do this.</div></div></blockquote><div><br></div><div>Sorry. This is on my list of things to do.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>2. We could support some automatic transfer of the target attribute to the caller when calling these intrinsics, but I worry that this is too confusing.</div></div></blockquote><div><br></div><div>We could, but it's probably better to leave it as is.</div><div><br></div><div>-eric</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><br></div><div>Implicitly setting a target attribute may block inlining that the user expected to happen, for example. Alternatively, there may be a dynamic cpuid check in the same function between SSE2 and AVX variants of the same algorithm, and now the SSE2 loop will unexpectedly use AVX instructions.</div><div><br></div><div>So we should probably settle with telling the user to add -msseNN or __atribute__((target(("sseNN")))).</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jul 30, 2015 at 8:53 AM, Vedant Kumar <span dir="ltr"><<a href="mailto:vsk@apple.com" target="_blank">vsk@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I've run into some code which no longer compiles because of two recent changes:<br>
<br>
41885d3 Update the intel intrinsic headers to use the target attribute support.<br>
695aff1 Use a define for per-file function attributes for the Intel intrinsic headers.<br>
<br>
Specifically, one project defines its own SSE4.1 emulation routines when the real intrinsics aren't available. This is a problem because they've reused the names of the intrinsics. E.g;<br>
<br>
> #ifndef __SSE4_1__<br>
> #define _mm_extract_epi8(a_, ndx) ({ ... })<br>
> static inline __m128i _mm_blendv_epi8(__m128i a, __m128i b, __m128i mask) { ... }<br>
> ...<br>
> #endif<br>
<br>
SSE4.1 intrinsics now leak into the project when it's being compiled for targets without SSE4.1 support. Compilation fails with "error: redefinition ...".<br>
<br>
When these changes were initially being discussed, I think our stance was that we shouldn't support code like this [1]. However, we should reconsider for the sake of avoiding breakage. AFAICT, we would need to revert just two types of changes:<br>
<br>
In lib/Headers/__wmmintrin_aes.h:<br>
<br>
> -#if defined (__SSE4_2__) || defined (__SSE4_1__)<br>
> #include <smmintrin.h><br>
> -#endif<br>
<br>
<br>
In lib/Headers/smmintrin.h:<br>
<br>
> -#ifndef __SSE4_1__<br>
> -#error "SSE4.1 instruction set not enabled"<br>
> -#else<br>
<br>
I don't see any downsides to reintroducing these guards. If everyone's OK with this, I can mail a patch in. The alternative is to have clients rewrite their emulation layers like this:<br>
<br>
> #ifdef __SSE4_1__<br>
> #define compat_mm_extract_epi8 _mm_extract_epi8<br>
> static inline __m128i combat_mm_blendv_epi8(__m128i a, __m128i b, __m128i mask) __attribute__((__target__(("sse4.1")))) {<br>
> return _mm_blendv_epi8(a, b, mask);<br>
> }<br>
> ...<br>
> #else /* OK, no native SSE 4.1. Define our own. */<br>
> #define compat_mm_extract_epi8(a_, ndx) ({ ... })<br>
> static inline __m128i compat_mm_blendv_epi8(__m128i a, __m128i b, __m128i mask) { ... }<br>
> ...<br>
> #endif<br>
<br>
... and then replace all calls to intrinsics with calls to the new compatibility routines. This seems like a lot of tedious work, and I'd love to help people avoid it :).<br>
<br>
Let me know what you think!<br>
<br>
vedant<br>
<br>
[1] <a href="http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150615/131192.html" rel="noreferrer" target="_blank">http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150615/131192.html</a><br>
_______________________________________________<br>
cfe-dev mailing list<br>
<a href="mailto:cfe-dev@cs.uiuc.edu" target="_blank">cfe-dev@cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><br>
</blockquote></div><br></div>
</blockquote></div></div>