[LLVMbugs] [Bug 10928] New: [AVX] build2.c performs worse on AVX than on SSE!

Wed Sep 14 17:10:32 PDT 2011

http://llvm.org/bugs/show_bug.cgi?id=10928

           Summary: [AVX] build2.c performs worse on AVX than on SSE!
           Product: libraries
           Version: trunk
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Register Allocator
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: bruno.cardoso at gmail.com
                CC: llvmbugs at cs.uiuc.edu

Created an attachment (id=7270)
 --> (http://llvm.org/bugs/attachment.cgi?id=7270)
Bitcode

Given the b.bc attached bitcode (extracted from
test-suite/SingleSource/UnitTests/Vector/build2.c), the only hot loop in the
program yields:

$ llc b.bc
  movss LCPI0_0(%rip), %xmm6                                                  
  addps %xmm9, %xmm6
  addps LCPI0_1(%rip), %xmm6
  addps %xmm12, %xmm6                                                         
  addps %xmm13, %xmm8                   
  addps %xmm14, %xmm8                   
  addps %xmm15, %xmm7                   
  addps %xmm0, %xmm7
  addps %xmm2, %xmm7
  addps %xmm1, %xmm6
  addps %xmm4, %xmm7
  addps %xmm3, %xmm8
  addps %xmm10, %xmm8
  addps %xmm5, %xmm8
  addps %xmm11, %xmm8
  decl  %ecx

while in AVX mode,
$ llc -mattr=+avx b.bc

  vaddps  %xmm12, %xmm10, %xmm0
  vaddps  %xmm13, %xmm0, %xmm0
  vaddps  %xmm14, %xmm0, %xmm1
  vaddps  %xmm15, %xmm7, %xmm0
  vaddps  %xmm9, %xmm0, %xmm7
  vaddps  %xmm11, %xmm6, %xmm0
  vaddps  LCPI0_6(%rip), %xmm0, %xmm0
  vaddps  %xmm3, %xmm0, %xmm8
  vaddps  %xmm4, %xmm1, %xmm10
  vaddps  %xmm5, %xmm8, %xmm6
  vaddps  LCPI0_10(%rip), %xmm7, %xmm7
  vaddps  LCPI0_11(%rip), %xmm7, %xmm7
  vaddps  LCPI0_12(%rip), %xmm7, %xmm7
  vaddps  %xmm2, %xmm7, %xmm7
  decl  %ecx

Although AVX is 3-addr instruction, it's rematerializing some constant pool
loads before the end of the loop, and that is making it becomes slower than the
SSE version. Digging into the problem (using -print-machineinstrs) I found out
that LICM hoist all constant pool loads out of the loop, but RA brings some of
them back (probably because it's running out of registers?).

-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.