<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-reply;
font-family:"Arial","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> cfe-dev-bounces@cs.uiuc.edu [mailto:cfe-dev-bounces@cs.uiuc.edu]
<b>On Behalf Of </b>Reid Kleckner<br>
<b>Sent:</b> Thursday, July 30, 2015 10:13 AM<br>
<b>To:</b> Vedant Kumar; Eric Christopher<br>
<b>Cc:</b> Clang Developers List<br>
<b>Subject:</b> Re: [cfe-dev] [PROPOSAL] Reintroduce guards for Intel intrinsic headers<o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">I'm opposed to this. Going forward, I would really like target intrinsics to be available regardless of the current feature set, so users don't need hacks like these.<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">I see two ways to do this with different tradeoffs:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">1. Diagnose missing target attributes when calling the intel intrinsics. I was surprised to find that we don't already do this.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">2. We could support some automatic transfer of the target attribute to the caller when calling these intrinsics, but I worry that this is too confusing.<o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">KBS> Regarding automatic transfer of the target attribute. It seems like something like this:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">static __inline __m256
<br>
__attribute__((__always_inline__, __nodebug__))<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">_mm256_add_ps(__m256 __a, __m256 __b)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">{<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"> __builtin_assume(__has_feature(FEATURE_AVX));<br>
return __a + __b;<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">}<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">might be a good way to represent this. It has the nice property that the __builtin_assume only applies at the exact point in execution where<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">it occurs. You could probably add something like this as a small addition to what Eric has already done with the target attribute, or you could try to make inlining
of something with the target attribute automatically apply a target __builtin_assume property.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">The difficulty is defining how the semantics of the different target properties have to be retained during optimization and code generation.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">I’m sure this difficulty is why the granularity that Eric is working on is at the routine level.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D">Kevin<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Implicitly setting a target attribute may block inlining that the user expected to happen, for example. Alternatively, there may be a dynamic cpuid check in the same function between SSE2 and AVX variants of the same algorithm, and now
the SSE2 loop will unexpectedly use AVX instructions.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">So we should probably settle with telling the user to add -msseNN or __atribute__((target(("sseNN")))).<o:p></o:p></p>
</div>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal">On Thu, Jul 30, 2015 at 8:53 AM, Vedant Kumar <<a href="mailto:vsk@apple.com" target="_blank">vsk@apple.com</a>> wrote:<o:p></o:p></p>
<p class="MsoNormal">I've run into some code which no longer compiles because of two recent changes:<br>
<br>
41885d3 Update the intel intrinsic headers to use the target attribute support.<br>
695aff1 Use a define for per-file function attributes for the Intel intrinsic headers.<br>
<br>
Specifically, one project defines its own SSE4.1 emulation routines when the real intrinsics aren't available. This is a problem because they've reused the names of the intrinsics. E.g;<br>
<br>
> #ifndef __SSE4_1__<br>
> #define _mm_extract_epi8(a_, ndx) ({ ... })<br>
> static inline __m128i _mm_blendv_epi8(__m128i a, __m128i b, __m128i mask) { ... }<br>
> ...<br>
> #endif<br>
<br>
SSE4.1 intrinsics now leak into the project when it's being compiled for targets without SSE4.1 support. Compilation fails with "error: redefinition ...".<br>
<br>
When these changes were initially being discussed, I think our stance was that we shouldn't support code like this [1]. However, we should reconsider for the sake of avoiding breakage. AFAICT, we would need to revert just two types of changes:<br>
<br>
In lib/Headers/__wmmintrin_aes.h:<br>
<br>
> -#if defined (__SSE4_2__) || defined (__SSE4_1__)<br>
> #include <smmintrin.h><br>
> -#endif<br>
<br>
<br>
In lib/Headers/smmintrin.h:<br>
<br>
> -#ifndef __SSE4_1__<br>
> -#error "SSE4.1 instruction set not enabled"<br>
> -#else<br>
<br>
I don't see any downsides to reintroducing these guards. If everyone's OK with this, I can mail a patch in. The alternative is to have clients rewrite their emulation layers like this:<br>
<br>
> #ifdef __SSE4_1__<br>
> #define compat_mm_extract_epi8 _mm_extract_epi8<br>
> static inline __m128i combat_mm_blendv_epi8(__m128i a, __m128i b, __m128i mask) __attribute__((__target__(("sse4.1")))) {<br>
> return _mm_blendv_epi8(a, b, mask);<br>
> }<br>
> ...<br>
> #else /* OK, no native SSE 4.1. Define our own. */<br>
> #define compat_mm_extract_epi8(a_, ndx) ({ ... })<br>
> static inline __m128i compat_mm_blendv_epi8(__m128i a, __m128i b, __m128i mask) { ... }<br>
> ...<br>
> #endif<br>
<br>
... and then replace all calls to intrinsics with calls to the new compatibility routines. This seems like a lot of tedious work, and I'd love to help people avoid it :).<br>
<br>
Let me know what you think!<br>
<br>
vedant<br>
<br>
[1] <a href="http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150615/131192.html" target="_blank">
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20150615/131192.html</a><br>
_______________________________________________<br>
cfe-dev mailing list<br>
<a href="mailto:cfe-dev@cs.uiuc.edu">cfe-dev@cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><o:p></o:p></p>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
</body>
</html>