[PATCH] D110995: [X86] combineMulToPMADDWD - handle any pow2 vector type and split to legal types

Mon Nov 8 02:24:32 PST 2021

RKSimon added inline comments.

================
Comment at: llvm/test/CodeGen/X86/shrink_vmul.ll:992
 ; X86-SSE-NEXT:    punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1],xmm1[2],xmm2[2],xmm1[3],xmm2[3],xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]
-; X86-SSE-NEXT:    punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1],xmm1[2],xmm2[2],xmm1[3],xmm2[3]
-; X86-SSE-NEXT:    pmaddwd %xmm0, %xmm1
-; X86-SSE-NEXT:    movq %xmm1, (%ecx,%eax,4)
+; X86-SSE-NEXT:    pshuflw {{.*#+}} xmm1 = xmm1[0,1,1,3,4,5,6,7]
+; X86-SSE-NEXT:    punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
----------------
pengfei wrote:
> The shuffle looks odd. What's its equivalent in the left?
The (i32 sext(i8))) operand is now (i32 zext(i16 sext(i8))), which has allowed SimplifyDemandedVectorElts to fold the (i32 zext(i8)) operand to (i32 aext(i16 zext(i8))).

The aext was lowered to unpacklwd == shuffle(0,u,1,u,2,u,3,u) but as we only require the bottom 64-bits of the vector we can further simplify that to the pshuflw == shuffle(0,u,1,u,u,u,u,u)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110995/new/

https://reviews.llvm.org/D110995