[LLVMdev] How does SSEDomainFix work?

Mon May 10 21:07:44 PDT 2010

Hello. This is my 1st post.

I have tried SSE execution domain fixup pass.
But I am not able to see any improvements.

I expect for the example below to use MOVDQA, PAND &c.
(On nehalem, ANDPS is extremely slower than PAND)

Please tell me if something would be wrong for me.

Thank you.
Takumi

Host: i386-mingw32
Build: trunk at 103373

foo.ll:
define <4 x i32> @foo(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z)
nounwind readnone {
entry:
  %0 = and <4 x i32> %x, %z
  %not = xor <4 x i32> %z, <i32 -1, i32 -1, i32 -1, i32 -1>
  %1 = and <4 x i32> %not, %y
  %2 = xor <4 x i32> %0, %1
  ret <4 x i32> %2
}

define <2 x i64> @bar(<2 x i64> %x, <2 x i64> %y, <2 x i64> %z)
nounwind readnone {
entry:
  %0 = and <2 x i64> %x, %z
  %not = xor <2 x i64> %z, <i64 -1, i64 -1>
  %1 = and <2 x i64> %not, %y
  %2 = xor <2 x i64> %0, %1
  ret <2 x i64> %2
}

$ llc -mcpu=nehalem -debug-pass=Structure foo.bc -o foo.s
(snip)
    Code Placement Optimizater
    SSE execution domain fixup
    Machine Natural Loop Construction
    X86 AT&T-Style Assembly Printer
    Delete Garbage Collector Information

foo.s: (edited)
_foo:
	movaps	%xmm0, %xmm3
	andps	%xmm2, %xmm3
	andnps	%xmm1, %xmm2
	movaps	%xmm2, %xmm0
	xorps	%xmm3, %xmm0
	ret

_bar:
	movaps	%xmm0, %xmm3
	andps	%xmm2, %xmm3
	andnps	%xmm1, %xmm2
	movaps	%xmm2, %xmm0
	xorps	%xmm3, %xmm0
	ret