<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:SimSun;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:SimSun;
panose-1:2 1 6 0 3 1 1 1 1 1;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.25in 1.0in 1.25in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:627467100;
mso-list-type:hybrid;
mso-list-template-ids:-976984800 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l0:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">Greetings everyone. Please allow me to illustrate my problem here:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">First, please consider the following sample code:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">#include <stdio.h><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">#include <iostream><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">#include <vector><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">#include <immintrin.h><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">using namespace std;
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">int main(int argc, char const *argv[])<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">{<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"> __m256i x ,y ;<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"> __m256i res = _mm256_and_si256(x, y);<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"> return 0;<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">}<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">It can be compiled easily using clang –mavx2 source.cc<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">And we now want to re-write this _mm256_and_si256 function using inline ASM, just like the following:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">#include <stdio.h><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">#include <iostream><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">#include <vector><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">using namespace std;
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">typedef float __m256 __attribute__ ((__vector_size__ (32)));<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">typedef double __m256d __attribute__((__vector_size__(32)));<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">typedef long long __m256i __attribute__((__vector_size__(32)));<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">typedef long long __v4di __attribute__ ((__vector_size__ (32)));<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">typedef int __v8si __attribute__ ((__vector_size__ (32)));<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">typedef short __v16hi __attribute__ ((__vector_size__ (32)));<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">typedef char __v32qi __attribute__ ((__vector_size__ (32)));<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">__attribute__((always_inline)) inline<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">__m256i _my_mm256_and_si256(__m256i s1, __m256i s2)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">{<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"> __m256i result;<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"> __asm__ ("vpand %2, %1, %0" : "=x"(result) : "x"(s1), "xm"(s2) );<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"> return result;<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">}<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">int main(int argc, char const *argv[])<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">{<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"> __m256i x ,y ;<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"> __m256i res = _my_mm256_and_si256(x, y );<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"> return 0;<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">}<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">This new code can be compiled well also using clang –mavx2 source.cc<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">However, if we remove the –mavx2 flag, clang will emit the error:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">fatal error: error in backend: Do not know how to split the result of this operator!<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">clang: error: clang frontend command failed with exit code 70 (use -v to see invocation)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">Someone has given me an explanation here saying
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">If we miss the –mavx4.2 flag, the clang/llvm is unable to determine the right machine target to bind the input memory parameter to the input register required by the vpand operator here
since the vpand operator requires ymm[0..7] as its input/output. <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">This makes some sense and I guess the gcc error output 20 : error: impossible constraint in 'asm' are actually complaining the similar thing. However, this doesn’t explain why the sse4.2
asm code can be compiled without the –msse4.2 flag. So please allow me to show you more here:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">#include <stdio.h><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">#include <iostream><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">#include <vector><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">#include <stdint.h><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">#include <emmintrin.h><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">using namespace std;
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"> <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">static inline __attribute__ ((__always_inline__))
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">int new_cmpestri(<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"> __m128i str1, int len1, __m128i str2, int len2, const int mode) {<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"> int result;<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"> __asm__("pcmpestri %5, %2, %1"<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"> : "=c"(result) : "x"(str1), "xm"(str2), "a"(len1), "d"(len2), "i"(mode) : "cc");<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"> return result;<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">}<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">int main(int argc, char const *argv[])<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">{<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"> __m128i str1;
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"> int len1 = 0;
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"> __m128i str2;
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"> int len2 =0;
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"> const int mode = 0;
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"> uint32_t result = new_cmpestri(str1, len1, str2, len2, mode);<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"> return 0;<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">}<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">And the CPUID Flags of pcmpestri is SSE4.2. But this code can be compiled well without –msse4.2 flag.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">I have conducted experiments with both gcc 4.9.2 and clang 3.3.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">So in brief, I have two questions:<o:p></o:p></span></p>
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l0 level1 lfo1"><![if !supportLists]><span style="font-family:"Times New Roman",serif"><span style="mso-list:Ignore">1.<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]><span style="font-family:"Times New Roman",serif">Is it a possible task to compile the AVX2 ASM without –mavx flag using clang?<o:p></o:p></span></p>
<p class="MsoListParagraph" style="text-indent:-.25in;mso-list:l0 level1 lfo1"><![if !supportLists]><span style="font-family:"Times New Roman",serif"><span style="mso-list:Ignore">2.<span style="font:7.0pt "Times New Roman"">
</span></span></span><![endif]><span style="font-family:"Times New Roman",serif">If the answer to question 1 is NO, then why we can do that for SSE4.2 ASM without –msse4.2 flag?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif">Thank you very much for taking time reading this letter!<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Times New Roman",serif"><o:p></o:p></span></p>
</div>
</body>
</html>