[LLVMbugs] [Bug 9070] New: Incorrect code generated from shuffles
bugzilla-daemon at llvm.org
bugzilla-daemon at llvm.org
Thu Jan 27 08:05:42 PST 2011
http://llvm.org/bugs/show_bug.cgi?id=9070
Summary: Incorrect code generated from shuffles
Product: libraries
Version: trunk
Platform: PC
OS/Version: Windows XP
Status: NEW
Severity: normal
Priority: P
Component: Backend: X86
AssignedTo: unassignedbugs at nondot.org
ReportedBy: zvi.rackover at intel.com
CC: llvmbugs at cs.uiuc.edu
Running llc on the following test (also attached) gives incorrect generated
code:
target datalayout =
"e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-f80:128:128-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32"
target triple = "i686-pc-win32"
define void @test1(<8 x i32>* %source, <2 x i32>* %dest) nounwind {
%a149 = getelementptr inbounds <8 x i32>* %source
%a150 = load <8 x i32>* %a149, align 32
%a151 = shufflevector <8 x i32> %a150, <8 x i32> undef, <2 x i32> <i32 0, i32
5>
%a152 = shufflevector <2 x i32> %a151, <2 x i32> undef, <2 x i32> <i32 1, i32
0>
%a153 = getelementptr inbounds <2 x i32>* %dest
store <2 x i32> %a152, <2 x i32>* %a153, align 8
ret void
}
The test reads an <8 x i32> source vector from memory and writes a <2 x i32>
dest vector to memory.
The two shuffles do:
temp.0 = source.0
temp.1 = source.5
dest.0 = temp.1
dest.1 = temp.0
Which is equivalent to:
dest.0 = source.5
dest.1 = source.0
Output:
llc < test-repro.ll
.def _test1;
.scl 2;
.type 32;
.endef
.text
.globl _test1
.align 16, 0x90
_test1: # @test1
# BB#0:
movl 4(%esp), %eax
movaps 16(%eax), %xmm0
movlps (%eax), %xmm0
pshufd $1, %xmm0, %xmm0 # xmm0 = xmm0[1,0,0,0]
movl 8(%esp), %eax
pextrd $1, %xmm0, 4(%eax)
movd %xmm0, (%eax)
ret
After the 'movaps':
XMM0 = [source.4 source.5 source.6 source.7]
After the 'movlps':
XMM0 = [source.0 source.1 source.6 source.7]
After the 'pshufd':
XMM0 = [source.1 source.0 source.0 source.0]
The 'pextrd' writes:
dest.1 = source.0
The 'movd' writes:
dest.0 = source.1 <== This is not correct see explanation of test above.
Removing the following pattern from X86InstrSSE.td gives correct (but
inefficient) code:
def : Pat<(X86Movss VR128:$src1,
(bc_v4i32 (v2i64 (load addr:$src2)))),
(MOVLPSrm VR128:$src1, addr:$src2)>;
--
Configure bugmail: http://llvm.org/bugs/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the llvm-bugs
mailing list