[PATCH] [X86][SSE] Keep 4i32 vector insertions in integer domain on pre-SSE4.1 targets
Chandler Carruth
chandlerc at gmail.com
Thu Dec 4 07:44:31 PST 2014
================
Comment at: test/CodeGen/X86/vector-shuffle-128-v4.ll:663-665
@@ -662,5 +662,5 @@
; SSE2: # BB#0:
-; SSE2-NEXT: xorps %xmm1, %xmm1
-; SSE2-NEXT: movss %xmm0, %xmm1
-; SSE2-NEXT: movaps %xmm1, %xmm0
+; SSE2-NEXT: pxor %xmm1, %xmm1
+; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
+; SSE2-NEXT: movq %xmm0, %xmm0
; SSE2-NEXT: retq
----------------
I think an even better pattern is: movq, pshufd 0,2,2,2?
Also, do we correctly match to movd when the source is a foldable load? I can't remember if there is a test case for that, but its really important to not do a shuffle when just loading a single i32 from memory into an xmm register.
================
Comment at: test/CodeGen/X86/vector-shuffle-128-v4.ll:700-703
@@ -699,5 +699,6 @@
; SSE2: # BB#0:
-; SSE2-NEXT: xorps %xmm1, %xmm1
-; SSE2-NEXT: movss %xmm0, %xmm1
-; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,0,1,1]
+; SSE2-NEXT: pxor %xmm1, %xmm1
+; SSE2-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
+; SSE2-NEXT: movq %xmm0, %xmm0
+; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,0,1,1]
; SSE2-NEXT: retq
----------------
This highlights that our lowering for this is completely wrong. movq + pshufd is better even with SEE4.1, and movd + pshufd is better when we can fold the load....
http://reviews.llvm.org/D6526
More information about the llvm-commits
mailing list