[llvm-bugs] [Bug 31151] New: failure to recognize byte unpack shuffles without ssse3 available
via llvm-bugs
llvm-bugs at lists.llvm.org
Wed Nov 23 21:13:26 PST 2016
https://llvm.org/bugs/show_bug.cgi?id=31151
Bug ID: 31151
Summary: failure to recognize byte unpack shuffles without
ssse3 available
Product: libraries
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P
Component: Backend: X86
Assignee: unassignedbugs at nondot.org
Reporter: sroland at vmware.com
CC: llvm-bugs at lists.llvm.org
Classification: Unclassified
Sometimes llvm fails to match even pretty trivial shuffles if pshufb isn't
available (at least I think that's the reason, it works with ssse3 but not
without), even if the result wouldn't use pshufb anyway.
This code (note that both shuffles are trivially doable with just sse2) works
fine with ssse3 and higher but not without (well, the result is correct at
least, but the anti-optimization is obvious and quite serious).
define <4 x i32> @unpackbwpshufd(<16 x i8> %val1, <16 x i8> %val2) {
entry:
%0 = shufflevector <16 x i8> %val1, <16 x i8> %val2, <16 x i32> <i32 0, i32
16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21,
i32 6, i32 22, i32 7, i32 23>
%1 = bitcast <16 x i8> %0 to <4 x i32>
%2 = shufflevector <4 x i32> %1, <4 x i32> undef, <4 x i32> <i32 0, i32 2,
i32 1, i32 3>
ret <4 x i32> %2
}
With -mattr=ssse3 the result is:
punpcklbw %xmm1, %xmm0 # xmm0 =
xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
pshufd $216, %xmm0, %xmm0 # xmm0 = xmm0[0,2,1,3]
retq
But without it, llvm emits this complicated mess, despite that obviously the
above code would work just fine with sse2:
punpcklbw %xmm1, %xmm1 # xmm1 =
xmm1[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
pshufd $216, %xmm1, %xmm2 # xmm2 = xmm1[0,2,1,3]
movdqa .LCPI0_0(%rip), %xmm1 # xmm1 =
[255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0]
punpcklbw %xmm0, %xmm0 # xmm0 =
xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
pshufd $216, %xmm0, %xmm0 # xmm0 = xmm0[0,2,1,3]
pand %xmm1, %xmm0
pandn %xmm2, %xmm1
por %xmm0, %xmm1
movdqa %xmm1, %xmm0
retq
Note that without the second shuffle (so just return %1 above) llvm emits the
punpcklbw just fine, without resorting to stitched together shuffles and
masks...
It actually worked at some point (it works with llvm 3.3 and fails with 3.7 and
newer), I suppose though that was back when llvm didn't have much of a shuffle
optimizer.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20161124/c6d93179/attachment.html>
More information about the llvm-bugs
mailing list