[PATCH] D44370: [X86] Combine vXi64 multiplies to MULDQ/MULUDQ during DAG combine instead of lowering.

Tue Mar 27 17:53:15 PDT 2018

craig.topper added inline comments.

================
Comment at: test/CodeGen/X86/mulvi32.ll:170
 ; SSE42:       # %bb.0:
-; SSE42-NEXT:    pxor %xmm3, %xmm3
+; SSE42-NEXT:    pshufd {{.*#+}} xmm2 = xmm0[2,3,0,1]
+; SSE42-NEXT:    pmovzxdq {{.*#+}} xmm3 = xmm2[0],zero,xmm2[1],zero
----------------
These shuffles are moving the high elements down so we can zero extend. The original code used a punpck with zero instead.

================
Comment at: test/CodeGen/X86/mulvi32.ll:183
 ; AVX1:       # %bb.0:
-; AVX1-NEXT:    vpxor %xmm2, %xmm2, %xmm2
-; AVX1-NEXT:    vpunpckhdq {{.*#+}} xmm3 = xmm1[2],xmm2[2],xmm1[3],xmm2[3]
-; AVX1-NEXT:    vpunpckhdq {{.*#+}} xmm2 = xmm0[2],xmm2[2],xmm0[3],xmm2[3]
-; AVX1-NEXT:    vpmuludq %xmm3, %xmm2, %xmm2
+; AVX1-NEXT:    vpshufd {{.*#+}} xmm2 = xmm1[2,2,3,3]
+; AVX1-NEXT:    vpshufd {{.*#+}} xmm3 = xmm0[2,2,3,3]
----------------
I think this is simplify demanded bits on pmuldq kicking in to remove the zeros going into elements 1 and 3. So they are effectively garbage.

https://reviews.llvm.org/D44370