<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">Hi Katya, <div><br></div><div>Can you please open a bugzilla bug report (<a href="http://llvm.org/bugs">llvm.org/bugs</a>) ?</div><div><br></div><div>Thanks,</div><div>Nadav</div><div><br><div><div>On Apr 8, 2013, at 7:20 PM, "Romanova, Katya" <<a href="mailto:Katya_Romanova@playstation.sony.com">Katya_Romanova@playstation.sony.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div lang="EN-US" link="blue" vlink="purple" style="letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><div class="WordSection1" style="page: WordSection1;"><pre style="margin: 0in 0in 0.0001pt; font-size: 10pt; font-family: 'Courier New'; background-color: white; background-position: initial initial; background-repeat: initial initial;"><span style="">Hello,<o:p></o:p></span></pre><pre style="margin: 0in 0in 0.0001pt; font-size: 10pt; font-family: 'Courier New'; background-color: white; background-position: initial initial; background-repeat: initial initial;"><span style=""> </span></pre><pre style="margin: 0in 0in 12pt; font-size: 10pt; font-family: 'Courier New'; background-color: white; background-position: initial initial; background-repeat: initial initial;"><span style="">LLVM generates two additional instructions for 128->256 bit typecasts <br>(e.g. _mm256_castsi128_si256()) to clear out the upper 128 bits of YMM register corresponding to source XMM register.<o:p></o:p></span></pre><pre style="margin: 0in 0in 0.0001pt; font-size: 10pt; font-family: 'Courier New'; background-color: white; background-position: initial initial; background-repeat: initial initial;"><span style="">    vxorps xmm2,xmm2,xmm2<o:p></o:p></span></pre><pre style="margin: 0in 0in 0.0001pt; font-size: 10pt; font-family: 'Courier New'; background-color: white; background-position: initial initial; background-repeat: initial initial;"><span style="">    vinsertf128 ymm0,ymm2,xmm0,0x0<br><br>Most of the industry-standard C/C++ compilers (GCC, <span class="yshortcuts">Intel</span>’s compiler, Visual Studio compiler) don’t<o:p></o:p></span></pre><pre style="margin: 0in 0in 0.0001pt; font-size: 10pt; font-family: 'Courier New'; background-color: white; background-position: initial initial; background-repeat: initial initial;"><span style="">generate any extra moves for 128-bit->256-bit typecast intrinsics.<o:p></o:p></span></pre><pre style="margin: 0in 0in 0.0001pt; font-size: 10pt; font-family: 'Courier New'; background-color: white; background-position: initial initial; background-repeat: initial initial;"><span style="">None of these compilers zero-extend the upper 128 bits of the 256-bit YMM register. Intel’s<o:p></o:p></span></pre><pre style="margin: 0in 0in 0.0001pt; font-size: 10pt; font-family: 'Courier New'; background-color: white; background-position: initial initial; background-repeat: initial initial;"><span style="">documentation for the _mm256_castsi128_si256 intrinsic explicitly states that “the upper bits of the<o:p></o:p></span></pre><pre style="margin: 0in 0in 0.0001pt; font-size: 10pt; font-family: 'Courier New'; background-color: white; background-position: initial initial; background-repeat: initial initial;"><span style="">resulting vector are undefined” and that “this intrinsic does not introduce extra moves to the<o:p></o:p></span></pre><pre style="margin: 0in 0in 0.0001pt; font-size: 10pt; font-family: 'Courier New'; background-color: white; background-position: initial initial; background-repeat: initial initial;"><span style="">generated code”. <o:p></o:p></span></pre><pre style="margin: 0in 0in 0.0001pt; font-size: 10pt; font-family: 'Courier New'; background-color: white; background-position: initial initial; background-repeat: initial initial;"><span style=""> </span></pre><pre style="margin: 0in 0in 0.0001pt; font-size: 10pt; font-family: 'Courier New'; background-color: white; background-position: initial initial; background-repeat: initial initial;"><span style=""><a href="http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/intref_cls/common/intref_avx_castsi128_si256.htm" target="_blank" style="color: purple; text-decoration: underline;"><span class="yshortcuts"><span style="color: blue;">http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/intref_cls/common/intref_avx_castsi128_si256.htm</span></span></a><o:p></o:p></span></pre><pre style="margin: 0in 0in 0.0001pt; font-size: 10pt; font-family: 'Courier New'; background-color: white; background-position: initial initial; background-repeat: initial initial;"><span style=""> </span></pre><pre style="margin: 0in 0in 12pt; font-size: 10pt; font-family: 'Courier New'; background-color: white; background-position: initial initial; background-repeat: initial initial;"><span style="">Clang implements these typecast intrinsics differently. Is this intentional? I suspect that this was done to avoid a hardware penalty caused by partial register writes. But, isn’t the overall cost of 2 additional instructions (vxor + vinsertf128) for *<b>every</b>* 128-bit->256-bit typecast intrinsic higher than the hardware penalty caused by partial register writes for *<b>rare</b>* cases when the upper part of YMM register corresponding to a source XMM register is not cleared already?  <o:p></o:p></span></pre><pre style="margin: 0in 0in 12pt; font-size: 10pt; font-family: 'Courier New'; background-color: white; background-position: initial initial; background-repeat: initial initial;"><span style="">Thanks!<o:p></o:p></span></pre><pre style="margin: 0in 0in 12pt; font-size: 10pt; font-family: 'Courier New'; background-color: white; background-position: initial initial; background-repeat: initial initial;"><span style="">Katya.<o:p></o:p></span></pre><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;"><o:p> </o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 11pt; font-family: Calibri, sans-serif;"><o:p> </o:p></div></div>_______________________________________________<br>LLVM Developers mailing list<br><a href="mailto:LLVMdev@cs.uiuc.edu" style="color: purple; text-decoration: underline;">LLVMdev@cs.uiuc.edu</a><span class="Apple-converted-space"> </span>        <a href="http://llvm.cs.uiuc.edu/" style="color: purple; text-decoration: underline;">http://llvm.cs.uiuc.edu</a><br><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" style="color: purple; text-decoration: underline;">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a></div></blockquote></div><br></div></body></html>