[LLVMbugs] [Bug 14268] New: _mm_dp_ps generating 55% more inefficient instructions

Mon Nov 5 19:09:25 PST 2012

http://llvm.org/bugs/show_bug.cgi?id=14268

             Bug #: 14268
           Summary: _mm_dp_ps generating 55% more inefficient instructions
           Product: libraries
           Version: trunk
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: X86
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: ramihg at gmail.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

Created attachment 9499
  --> http://llvm.org/bugs/attachment.cgi?id=9499
Simple test case

OS used: Mac OS X Mountain Lion
Clang/LLVM Used: 3.2-r167157
Processor: Intel Core i7

I was recently doing some SSE dot product implementations when I noticed a
severe performance drop after using _mm_dp_ps. This didn't make sense as the
x86 instruction latency manual showed that the dpps instruction is only 12
cycles latency.

Further investigation showed that dpps with a memory operand source: (e.g. dpps
$15, (%rex, %rdx), %xmm0) is about 55% slower than issuing a movaps then a dpps
with a register source/destination!

I've created and attached a simple test case that does dpps using intrinsics,
then using hand-written assembly with register operands, then using
hand-written assembly with memory source operand. The intrinsics version issued
instructions with memory operands.

Here is how I compiled the test:
clang++ -march=native -std=c++11 -stdlib=libc++ -O3 dotps.cpp

Here are the results I got (running each version 100000000 times):
DotPsIntrin: 249.332 ms
DotPsFast: 159.916 ms
DotPsSlow: 249.076 ms

I also tried the intrinsics version with Visual Studio 2012 and the Intel
compiler. Both generated the efficient movaps/dpps version even when specifying
"Optimize for Space".

-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.