[PATCH] [X86][SSE] Keep 4i32 vector insertions in integer domain	on pre-SSE4.1 targets
    Chandler Carruth 
    chandlerc at gmail.com
       
    Thu Dec  4 07:44:31 PST 2014
    
    
  
================
Comment at: test/CodeGen/X86/vector-shuffle-128-v4.ll:663-665
@@ -662,5 +662,5 @@
 ; SSE2:       # BB#0:
-; SSE2-NEXT:    xorps %xmm1, %xmm1
-; SSE2-NEXT:    movss %xmm0, %xmm1
-; SSE2-NEXT:    movaps %xmm1, %xmm0
+; SSE2-NEXT:    pxor %xmm1, %xmm1
+; SSE2-NEXT:    punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
+; SSE2-NEXT:    movq %xmm0, %xmm0
 ; SSE2-NEXT:    retq
----------------
I think an even better pattern is: movq, pshufd 0,2,2,2?
Also, do we correctly match to movd when the source is a foldable load? I can't remember if there is a test case for that, but its really important to not do a shuffle when just loading a single i32 from memory into an xmm register.
================
Comment at: test/CodeGen/X86/vector-shuffle-128-v4.ll:700-703
@@ -699,5 +699,6 @@
 ; SSE2:       # BB#0:
-; SSE2-NEXT:    xorps %xmm1, %xmm1
-; SSE2-NEXT:    movss %xmm0, %xmm1
-; SSE2-NEXT:    pshufd {{.*#+}} xmm0 = xmm1[1,0,1,1]
+; SSE2-NEXT:    pxor %xmm1, %xmm1
+; SSE2-NEXT:    punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
+; SSE2-NEXT:    movq %xmm0, %xmm0
+; SSE2-NEXT:    pshufd {{.*#+}} xmm0 = xmm0[1,0,1,1]
 ; SSE2-NEXT:    retq
----------------
This highlights that our lowering for this is completely wrong. movq + pshufd is better even with SEE4.1, and movd + pshufd is better when we can fold the load....
http://reviews.llvm.org/D6526
    
    
More information about the llvm-commits
mailing list