[LLVMbugs] [Bug 12412] New: shufflevector should use pack instructions in some cases

Thu Mar 29 14:07:50 PDT 2012

http://llvm.org/bugs/show_bug.cgi?id=12412

             Bug #: 12412
           Summary: shufflevector should use pack instructions in some
                    cases
           Product: libraries
           Version: trunk
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: X86
        AssignedTo: unassignedbugs at nondot.org
        ReportedBy: sroland at vmware.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

shufflevector doesn't recognize patterns which would benefit from using pack
instructions as far as I can tell.
This code

define <16 x i8> @shuf(<16 x i8> %inval1, <16 x i8> %inval2) {
entry:
  %0 = shufflevector <16 x i8> %inval1, <16 x i8> %inval2, <16 x i32> <i32 0,
i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 16, i32 18, i32 20, i32
22, i32 24, i32 26, i32 28, i32 30>
  ret <16 x i8> %0
}

generates

    pshufb    .LCPI0_0(%rip), %xmm1
    pshufb    .LCPI0_1(%rip), %xmm0
    por    %xmm1, %xmm0
    ret

which is ok. However, when ssse3 is not available, it generates this instead:

    movdqa    %xmm0, %xmm2
    pextrw    $3, %xmm2, %ecx
    shll    $8, %ecx
    pextrw    $2, %xmm2, %eax
    movzbl    %al, %eax
    orl    %ecx, %eax
    pextrw    $1, %xmm2, %ecx
    shll    $8, %ecx
    movd    %xmm2, %edx
    movzbl    %dl, %edx
    orl    %ecx, %edx
    pinsrw    $0, %edx, %xmm0
    pinsrw    $1, %eax, %xmm0
    pextrw    $5, %xmm2, %eax
    shll    $8, %eax
    pextrw    $4, %xmm2, %ecx
    movzbl    %cl, %ecx
    orl    %eax, %ecx
    pextrw    $7, %xmm2, %eax
    pinsrw    $2, %ecx, %xmm0
    shll    $8, %eax
    pextrw    $6, %xmm2, %ecx
    movzbl    %cl, %edx
    orl    %eax, %edx
    pextrw    $8, %xmm1, %eax
    pextrw    $9, %xmm1, %ecx
    pinsrw    $3, %edx, %xmm0
    shll    $8, %ecx
    movzbl    %al, %eax
    pextrw    $11, %xmm1, %esi
    shll    $8, %esi
    pextrw    $10, %xmm1, %edx
    movzbl    %dl, %edx
    orl    %esi, %edx
    orl    %ecx, %eax
    pextrw    $15, %xmm1, %ecx
    shll    $8, %ecx
    pextrw    $14, %xmm1, %esi
    pinsrw    $4, %eax, %xmm0
    movzbl    %sil, %eax
    pinsrw    $5, %edx, %xmm0
    orl    %ecx, %eax
    pextrw    $13, %xmm1, %ecx
    shll    $8, %ecx
    pextrw    $12, %xmm1, %edx
    movzbl    %dl, %edx
    orl    %ecx, %edx
    pinsrw    $6, %edx, %xmm0
    pinsrw    $7, %eax, %xmm0
    ret

which is definitely terrible.
This shuffle maps directly to packuswb, though it would be necessary to mask
out half the values so it really is a shuffle and not a saturated pack.
This:
    pand    .LCPI0_0(%rip), %xmm1   //constant is 0x00ff00ff...
    pand    .LCPI0_0(%rip), %xmm0
    packuswb %xmm0, %xmm1
    ret

should do the trick (might even be preferable to the pshufb solution),
obviously that's leaps and bounds better if pshufb isn't available (the
synthesized byte shuffle looks terrible anyway, I would think even without
using pack there should be much better solutions using some combinations of
shifts, unpacks, shuffles without leaving the simd stack).

-- 
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.