[LLVMbugs] [Bug 5626] New: Poor codegen for operations on vectors of non-power-of-two lengths

bugzilla-daemon at cs.uiuc.edu bugzilla-daemon at cs.uiuc.edu
Fri Nov 27 06:25:14 PST 2009


http://llvm.org/bugs/show_bug.cgi?id=5626

           Summary: Poor codegen for operations on vectors of non-power-of-
                    two lengths
           Product: new-bugs
           Version: 2.6
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Keywords: code-quality
          Severity: normal
          Priority: P2
         Component: new bugs
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: matti.niemenmaa+llvmbugs at iki.fi
                CC: llvmbugs at cs.uiuc.edu


Created an attachment (id=3876)
 --> (http://llvm.org/bugs/attachment.cgi?id=3876)
Additions on i32 vectors of lengths 3, 4, 7, 8

Attached is a simple, optimized LLVM test case, demonstrating addition on pairs
of four different vector types: <3 x i32>, <4 x i32>, <7 x i32>, <8 x i32>.
I've been compiling it with "llvm-as -f add-vectors.ll && llc -O3 -march=x86-64
-mcpu=core2 -mattr=+sse41 add-vectors.bc -f" and examining the resulting
add-vectors.s.

The <4 x i32> and <8 x i32> cases result in good-looking assembly code: one
paddd and movaps per four elements in the vector. There's no room for
improvement here.

Optimally, the <3 x i32> and <7 x i32> cases should result in exactly the same
code. After all, it's the same operation, just ignoring the upper i32 value in
each XMM register. Unfortunately, they are compiled into lots of extra moves
and element insertions/extractions. For example, the end result for the <3 x
i32> one looks like this with LLVM 2.6:

        pinsrd  $0, %esi, %xmm0
        pinsrd  $1, %edx, %xmm0
        pinsrd  $2, %ecx, %xmm0
        pinsrd  $0, %r8d, %xmm1
        pinsrd  $1, %r9d, %xmm1
        pinsrd  $2, 8(%rsp), %xmm1
        paddd   %xmm0, %xmm1
        pextrd  $2, %xmm1, 8(%rdi)
        movq    %xmm1, (%rdi)
        ret

The results get worse as the vector lengths grow: only the power-of-two vectors
come out nicely.

Addition is used in the attached example, but the results are similar
regardless of operation: sub, mul, xor, whatever. Likewise, changing the types
from i32 to i64 or floating-point types doesn't make a difference.


-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.



More information about the llvm-bugs mailing list