[LLVMbugs] [Bug 12412] New: shufflevector should use pack instructions in some cases
bugzilla-daemon at llvm.org
bugzilla-daemon at llvm.org
Thu Mar 29 14:07:50 PDT 2012
http://llvm.org/bugs/show_bug.cgi?id=12412
Bug #: 12412
Summary: shufflevector should use pack instructions in some
cases
Product: libraries
Version: trunk
Platform: PC
OS/Version: Linux
Status: NEW
Severity: enhancement
Priority: P
Component: Backend: X86
AssignedTo: unassignedbugs at nondot.org
ReportedBy: sroland at vmware.com
CC: llvmbugs at cs.uiuc.edu
Classification: Unclassified
shufflevector doesn't recognize patterns which would benefit from using pack
instructions as far as I can tell.
This code
define <16 x i8> @shuf(<16 x i8> %inval1, <16 x i8> %inval2) {
entry:
%0 = shufflevector <16 x i8> %inval1, <16 x i8> %inval2, <16 x i32> <i32 0,
i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 16, i32 18, i32 20, i32
22, i32 24, i32 26, i32 28, i32 30>
ret <16 x i8> %0
}
generates
pshufb .LCPI0_0(%rip), %xmm1
pshufb .LCPI0_1(%rip), %xmm0
por %xmm1, %xmm0
ret
which is ok. However, when ssse3 is not available, it generates this instead:
movdqa %xmm0, %xmm2
pextrw $3, %xmm2, %ecx
shll $8, %ecx
pextrw $2, %xmm2, %eax
movzbl %al, %eax
orl %ecx, %eax
pextrw $1, %xmm2, %ecx
shll $8, %ecx
movd %xmm2, %edx
movzbl %dl, %edx
orl %ecx, %edx
pinsrw $0, %edx, %xmm0
pinsrw $1, %eax, %xmm0
pextrw $5, %xmm2, %eax
shll $8, %eax
pextrw $4, %xmm2, %ecx
movzbl %cl, %ecx
orl %eax, %ecx
pextrw $7, %xmm2, %eax
pinsrw $2, %ecx, %xmm0
shll $8, %eax
pextrw $6, %xmm2, %ecx
movzbl %cl, %edx
orl %eax, %edx
pextrw $8, %xmm1, %eax
pextrw $9, %xmm1, %ecx
pinsrw $3, %edx, %xmm0
shll $8, %ecx
movzbl %al, %eax
pextrw $11, %xmm1, %esi
shll $8, %esi
pextrw $10, %xmm1, %edx
movzbl %dl, %edx
orl %esi, %edx
orl %ecx, %eax
pextrw $15, %xmm1, %ecx
shll $8, %ecx
pextrw $14, %xmm1, %esi
pinsrw $4, %eax, %xmm0
movzbl %sil, %eax
pinsrw $5, %edx, %xmm0
orl %ecx, %eax
pextrw $13, %xmm1, %ecx
shll $8, %ecx
pextrw $12, %xmm1, %edx
movzbl %dl, %edx
orl %ecx, %edx
pinsrw $6, %edx, %xmm0
pinsrw $7, %eax, %xmm0
ret
which is definitely terrible.
This shuffle maps directly to packuswb, though it would be necessary to mask
out half the values so it really is a shuffle and not a saturated pack.
This:
pand .LCPI0_0(%rip), %xmm1 //constant is 0x00ff00ff...
pand .LCPI0_0(%rip), %xmm0
packuswb %xmm0, %xmm1
ret
should do the trick (might even be preferable to the pshufb solution),
obviously that's leaps and bounds better if pshufb isn't available (the
synthesized byte shuffle looks terrible anyway, I would think even without
using pack there should be much better solutions using some combinations of
shifts, unpacks, shuffles without leaving the simd stack).
--
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the llvm-bugs
mailing list