[PATCH] [X86][SSE] Keep 4i32 vector insertions in integer domain on pre-SSE4.1 targets

Thu Dec 4 07:44:31 PST 2014

================
Comment at: test/CodeGen/X86/vector-shuffle-128-v4.ll:663-665
@@ -662,5 +662,5 @@
 ; SSE2:       # BB#0:
-; SSE2-NEXT:    xorps %xmm1, %xmm1
-; SSE2-NEXT:    movss %xmm0, %xmm1
-; SSE2-NEXT:    movaps %xmm1, %xmm0
+; SSE2-NEXT:    pxor %xmm1, %xmm1
+; SSE2-NEXT:    punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
+; SSE2-NEXT:    movq %xmm0, %xmm0
 ; SSE2-NEXT:    retq
----------------
I think an even better pattern is: movq, pshufd 0,2,2,2?

Also, do we correctly match to movd when the source is a foldable load? I can't remember if there is a test case for that, but its really important to not do a shuffle when just loading a single i32 from memory into an xmm register.

================
Comment at: test/CodeGen/X86/vector-shuffle-128-v4.ll:700-703
@@ -699,5 +699,6 @@
 ; SSE2:       # BB#0:
-; SSE2-NEXT:    xorps %xmm1, %xmm1
-; SSE2-NEXT:    movss %xmm0, %xmm1
-; SSE2-NEXT:    pshufd {{.*#+}} xmm0 = xmm1[1,0,1,1]
+; SSE2-NEXT:    pxor %xmm1, %xmm1
+; SSE2-NEXT:    punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
+; SSE2-NEXT:    movq %xmm0, %xmm0
+; SSE2-NEXT:    pshufd {{.*#+}} xmm0 = xmm0[1,0,1,1]
 ; SSE2-NEXT:    retq
----------------
This highlights that our lowering for this is completely wrong. movq + pshufd is better even with SEE4.1, and movd + pshufd is better when we can fold the load....

http://reviews.llvm.org/D6526