[PATCH] D110995: [X86] combineMulToPMADDWD - handle any pow2 vector type and split to legal types

Mon Nov 8 03:56:52 PST 2021

pengfei added a comment.

I don't fully understand this patch without spending much time. But I don't have other comments.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:44521
+      // which will expand the extension.
+      if (Src.getScalarValueSizeInBits() <= 16 && !Subtarget.hasSSE41()) {
+        EVT ExtVT = VT.changeVectorElementType(MVT::i16);
----------------
`<` ?

================
Comment at: llvm/test/CodeGen/X86/shrink_vmul.ll:992
 ; X86-SSE-NEXT:    punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1],xmm1[2],xmm2[2],xmm1[3],xmm2[3],xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]
-; X86-SSE-NEXT:    punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1],xmm1[2],xmm2[2],xmm1[3],xmm2[3]
-; X86-SSE-NEXT:    pmaddwd %xmm0, %xmm1
-; X86-SSE-NEXT:    movq %xmm1, (%ecx,%eax,4)
+; X86-SSE-NEXT:    pshuflw {{.*#+}} xmm1 = xmm1[0,1,1,3,4,5,6,7]
+; X86-SSE-NEXT:    punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
----------------
RKSimon wrote:
> pengfei wrote:
> > The shuffle looks odd. What's its equivalent in the left?
> The (i32 sext(i8))) operand is now (i32 zext(i16 sext(i8))), which has allowed SimplifyDemandedVectorElts to fold the (i32 zext(i8)) operand to (i32 aext(i16 zext(i8))).
> 
> The aext was lowered to unpacklwd == shuffle(0,u,1,u,2,u,3,u) but as we only require the bottom 64-bits of the vector we can further simplify that to the pshuflw == shuffle(0,u,1,u,u,u,u,u)
Thank you. Just learnt we have aext. :)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110995/new/

https://reviews.llvm.org/D110995