[PATCH] D54668: [X86] Attempt to improve v32i8/v64i8 multiply lowering by applying the v16i8 non-avx2 algorithm to each 128-bit lane.
Simon Pilgrim via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Nov 18 05:16:43 PST 2018
RKSimon added a comment.
What does IACA/LLVM-MCA say about the regressions in min-legal-vector-width.ll and vector-reduce-mul.ll
================
Comment at: test/CodeGen/X86/vector-mul.ll:575
; X64-XOP-NEXT: vpmullw {{.*}}(%rip), %xmm0, %xmm0
-; X64-XOP-NEXT: vpperm {{.*#+}} xmm0 = xmm0[0,2,4,6,8,10,12,14],xmm1[0,2,4,6,8,10,12,14]
+; X64-XOP-NEXT: vpand %xmm2, %xmm0, %xmm0
+; X64-XOP-NEXT: vpackuswb %xmm1, %xmm0, %xmm0
----------------
craig.topper wrote:
> We lost the combine here that turned the and+packuswb into vpperm between vector op legalization and dag combine. I'm not sure why shuffle combining wasn't able to do the same with the regular shuffle.
This is rather odd - I'll take a look once this has landed.
Repository:
rL LLVM
https://reviews.llvm.org/D54668
More information about the llvm-commits
mailing list