[PATCH] D54512: [X86] Add -x86-experimental-vector-widening support to reduceVMULWidth and combineMulToPMADDWD
Simon Pilgrim via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Nov 15 02:40:11 PST 2018
RKSimon added inline comments.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26164
+ assert(VT.getSizeInBits() < 128);
+ assert(128 % VT.getSizeInBits() == 0);
unsigned NumConcat = 128 / InVT.getSizeInBits();
----------------
Since you're updating the code, please can you add assert messages.
================
Comment at: test/CodeGen/X86/shrink_vmul-widen.ll:61
; X64-SSE-NEXT: movzwl (%rdi,%rdx), %ecx
; X64-SSE-NEXT: movd %ecx, %xmm0
+; X64-SSE-NEXT: pxor %xmm1, %xmm1
----------------
Another couple of instances of whether we'd be better off doing PINSRW(PXOR) - see PR31287
================
Comment at: test/CodeGen/X86/shrink_vmul-widen.ll:70
+; X64-SSE-NEXT: pmaddwd %xmm0, %xmm2
+; X64-SSE-NEXT: movq %xmm2, (%rax,%rdx,4)
; X64-SSE-NEXT: retq
----------------
We're doing an extra shuffle here - is that going to be a problem?
================
Comment at: test/CodeGen/X86/shrink_vmul-widen.ll:1437
; X86-AVX-NEXT: vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
-; X86-AVX-NEXT: vpmulld {{\.LCPI.*}}, %xmm0, %xmm0
+; X86-AVX-NEXT: vpmaddwd {{\.LCPI.*}}, %xmm0, %xmm0
; X86-AVX-NEXT: vmovq %xmm0, (%edx,%eax,4)
----------------
Definite perf improvement here
https://reviews.llvm.org/D54512
More information about the llvm-commits
mailing list