[PATCH] D110995: [X86] combineMulToPMADDWD - handle any pow2 vector type and split to legal types
Simon Pilgrim via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Nov 8 02:24:32 PST 2021
RKSimon added inline comments.
================
Comment at: llvm/test/CodeGen/X86/shrink_vmul.ll:992
; X86-SSE-NEXT: punpcklbw {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1],xmm1[2],xmm2[2],xmm1[3],xmm2[3],xmm1[4],xmm2[4],xmm1[5],xmm2[5],xmm1[6],xmm2[6],xmm1[7],xmm2[7]
-; X86-SSE-NEXT: punpcklwd {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[1],xmm1[2],xmm2[2],xmm1[3],xmm2[3]
-; X86-SSE-NEXT: pmaddwd %xmm0, %xmm1
-; X86-SSE-NEXT: movq %xmm1, (%ecx,%eax,4)
+; X86-SSE-NEXT: pshuflw {{.*#+}} xmm1 = xmm1[0,1,1,3,4,5,6,7]
+; X86-SSE-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7]
----------------
pengfei wrote:
> The shuffle looks odd. What's its equivalent in the left?
The (i32 sext(i8))) operand is now (i32 zext(i16 sext(i8))), which has allowed SimplifyDemandedVectorElts to fold the (i32 zext(i8)) operand to (i32 aext(i16 zext(i8))).
The aext was lowered to unpacklwd == shuffle(0,u,1,u,2,u,3,u) but as we only require the bottom 64-bits of the vector we can further simplify that to the pshuflw == shuffle(0,u,1,u,u,u,u,u)
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D110995/new/
https://reviews.llvm.org/D110995
More information about the llvm-commits
mailing list