[LLVMbugs] [Bug 12175] New: Optimizing add <16 x i8> %x, %x to shl is suboptimal with SSE
bugzilla-daemon at llvm.org
bugzilla-daemon at llvm.org
Sat Mar 3 14:07:39 PST 2012
http://llvm.org/bugs/show_bug.cgi?id=12175
Bug #: 12175
Summary: Optimizing add <16 x i8> %x,%x to shl is suboptimal
with SSE
Product: new-bugs
Version: 3.0
Platform: PC
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P
Component: new bugs
AssignedTo: unassignedbugs at nondot.org
ReportedBy: arcata at gmail.com
CC: llvmbugs at cs.uiuc.edu
Classification: Unclassified
`opt -O2 -S` transforms this IR:
--
define <16 x i8> @foo(<16 x i8> %x) nounwind {
%y = add <16 x i8> %x, %x
ret <16 x i8> %y
}
--
into this IR:
--
define <16 x i8> @foo(<16 x i8> %x) nounwind readnone {
%y = shl <16 x i8> %x, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1,
i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
ret <16 x i8> %y
}
--
This is a bad idea with SSE, because the lowering for `shl <16 x i8>` is rather
involved:
--
_foo: ## @foo
## BB#0:
movdqa %xmm0, %xmm1
movdqa LCPI0_0(%rip), %xmm0
psllw $5, %xmm0
movdqa LCPI0_1(%rip), %xmm2
pand %xmm1, %xmm2
psllw $4, %xmm2
pblendvb %xmm0, %xmm2, %xmm1
movdqa LCPI0_2(%rip), %xmm2
pand %xmm1, %xmm2
psllw $2, %xmm2
paddb %xmm0, %xmm0
pblendvb %xmm0, %xmm2, %xmm1
movdqa %xmm1, %xmm2
paddb %xmm2, %xmm2
paddb %xmm0, %xmm0
pblendvb %xmm0, %xmm2, %xmm1
movdqa %xmm1, %xmm0
ret
--
The unoptimized IR by comparison can be lowered to a single instruction:
--
_foo: ## @foo
## BB#0:
paddb %xmm0, %xmm0
ret
--
--
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the llvm-bugs
mailing list