[PATCH] D54512: [X86] Add -x86-experimental-vector-widening support to reduceVMULWidth and combineMulToPMADDWD

Thu Nov 15 02:40:11 PST 2018

RKSimon added inline comments.

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:26164
+    assert(VT.getSizeInBits() < 128);
+    assert(128 % VT.getSizeInBits() == 0);
     unsigned NumConcat = 128 / InVT.getSizeInBits();
----------------
Since you're updating the code, please can you add assert messages.

================
Comment at: test/CodeGen/X86/shrink_vmul-widen.ll:61
 ; X64-SSE-NEXT:    movzwl (%rdi,%rdx), %ecx
 ; X64-SSE-NEXT:    movd %ecx, %xmm0
+; X64-SSE-NEXT:    pxor %xmm1, %xmm1
----------------
Another couple of instances of whether we'd be better off doing PINSRW(PXOR) - see PR31287

================
Comment at: test/CodeGen/X86/shrink_vmul-widen.ll:70
+; X64-SSE-NEXT:    pmaddwd %xmm0, %xmm2
+; X64-SSE-NEXT:    movq %xmm2, (%rax,%rdx,4)
 ; X64-SSE-NEXT:    retq
----------------
We're doing an extra shuffle here - is that going to be a problem?

================
Comment at: test/CodeGen/X86/shrink_vmul-widen.ll:1437
 ; X86-AVX-NEXT:    vpmovzxbd {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero,xmm0[2],zero,zero,zero,xmm0[3],zero,zero,zero
-; X86-AVX-NEXT:    vpmulld {{\.LCPI.*}}, %xmm0, %xmm0
+; X86-AVX-NEXT:    vpmaddwd {{\.LCPI.*}}, %xmm0, %xmm0
 ; X86-AVX-NEXT:    vmovq %xmm0, (%edx,%eax,4)
----------------
Definite perf improvement here

https://reviews.llvm.org/D54512