[LLVMbugs] [Bug 15712] New: inefficient code generation for 128-bit->256-bit typecast intrinsics

bugzilla-daemon at llvm.org bugzilla-daemon at llvm.org
Tue Apr 9 12:26:21 PDT 2013


http://llvm.org/bugs/show_bug.cgi?id=15712

            Bug ID: 15712
           Summary: inefficient code generation for 128-bit->256-bit
                    typecast intrinsics
           Product: new-bugs
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: new bugs
          Assignee: unassignedbugs at nondot.org
          Reporter: katya_romanova at playstation.sony.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

LLVM generates two additional instructions for 128->256 bit typecasts 
(e.g. _mm256_castsi128_si256, _mm256_castps128_ps256, _mm256_castpd128_pd256)
to clear out the upper 128 bits of YMM register corresponding to source XMM
register.

    vxorps xmm2,xmm2,xmm2
    vinsertf128 ymm0,ymm2,xmm0,0x0

Most of the industry-standard C/C++ compilers (GCC, Intel’s compiler, Visual
Studio compiler) don’t generate any extra moves for 128-bit->256-bit typecast
intrinsics. None of these compilers zero-extend the upper 128 bits of the
256-bit YMM register. Intel’s documentation for the _mm256_castsi128_si256
intrinsic explicitly states that “the upper bits of the resulting vector are
undefined” and that “this intrinsic does not introduce extra moves to the
generated code”. 

http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011Update/compiler_c/intref_cls/common/intref_avx_castsi128_si256.htm

Clang implements these typecast intrinsics differently. I suspect that this was
done on purpose to avoid a hardware penalty caused by partial register writes. 

I think that the overall cost of 2 additional instructions (vxor + vinsertf128)
for *every* 128-bit->256-bit typecast intrinsic much higher than the hardware
penalty caused by partial register writes for *rare* cases when the upper part
of YMM register corresponding to a source XMM register is not cleared already.

Katya.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20130409/0ac20fd0/attachment.html>


More information about the llvm-bugs mailing list