[LLVMdev] How does SSEDomainFix work?
NAKAMURA Takumi
geek4civic at gmail.com
Mon May 10 21:07:44 PDT 2010
Hello. This is my 1st post.
I have tried SSE execution domain fixup pass.
But I am not able to see any improvements.
I expect for the example below to use MOVDQA, PAND &c.
(On nehalem, ANDPS is extremely slower than PAND)
Please tell me if something would be wrong for me.
Thank you.
Takumi
Host: i386-mingw32
Build: trunk at 103373
foo.ll:
define <4 x i32> @foo(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z)
nounwind readnone {
entry:
%0 = and <4 x i32> %x, %z
%not = xor <4 x i32> %z, <i32 -1, i32 -1, i32 -1, i32 -1>
%1 = and <4 x i32> %not, %y
%2 = xor <4 x i32> %0, %1
ret <4 x i32> %2
}
define <2 x i64> @bar(<2 x i64> %x, <2 x i64> %y, <2 x i64> %z)
nounwind readnone {
entry:
%0 = and <2 x i64> %x, %z
%not = xor <2 x i64> %z, <i64 -1, i64 -1>
%1 = and <2 x i64> %not, %y
%2 = xor <2 x i64> %0, %1
ret <2 x i64> %2
}
$ llc -mcpu=nehalem -debug-pass=Structure foo.bc -o foo.s
(snip)
Code Placement Optimizater
SSE execution domain fixup
Machine Natural Loop Construction
X86 AT&T-Style Assembly Printer
Delete Garbage Collector Information
foo.s: (edited)
_foo:
movaps %xmm0, %xmm3
andps %xmm2, %xmm3
andnps %xmm1, %xmm2
movaps %xmm2, %xmm0
xorps %xmm3, %xmm0
ret
_bar:
movaps %xmm0, %xmm3
andps %xmm2, %xmm3
andnps %xmm1, %xmm2
movaps %xmm2, %xmm0
xorps %xmm3, %xmm0
ret
More information about the llvm-dev
mailing list