[PATCH] D66004: [WIP][X86][SSE] SimplifyDemandedVectorEltsForTargetNode - add general shuffle combining support

Mon Jun 22 02:06:35 PDT 2020

RKSimon added a comment.

cheers @xbolva00 for what its worth, this is still a WIP, and SimplifyDemanded improvements do have a tendency to expose existing limitations (as well as find some awesome improvements.....)

================
Comment at: llvm/test/CodeGen/X86/avx512bwvl-intrinsics-upgrade.ll:5637
 ; X64-NEXT:    kmovw %k0, %ecx # encoding: [0xc5,0xf8,0x93,0xc8]
 ; X64-NEXT:    vmovd %ecx, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6e,0xc1]
+; X64-NEXT:    vpmovzxdq %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc4,0xe2,0x79,0x35,0xc0]
----------------
xbolva00 wrote:
> Regression
The original vzext_movl (which isel would consume into the movd) has been replaced with a zero_extend_vector_inreg.

================
Comment at: llvm/test/CodeGen/X86/buildvec-insertvec.ll:312
+; SSE41-NEXT:    movd %eax, %xmm0
+; SSE41-NEXT:    pmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero
+; SSE41-NEXT:    pinsrw $3, %esi, %xmm0
----------------
xbolva00 wrote:
> Extra instructio (vs sse2)
The original vzext_movl (which isel would consume into the movd) has been replaced with a zero_extend_vector_inreg.

================
Comment at: llvm/test/CodeGen/X86/masked_expandload.ll:1302
+; SSE42-NEXT:    movsd {{.*#+}} xmm1 = mem[0],zero
+; SSE42-NEXT:    insertps {{.*#+}} xmm0 = xmm0[0,1],mem[0],xmm0[3]
+; SSE42-NEXT:    insertps {{.*#+}} xmm0 = xmm0[0,1,2],mem[0]
----------------
xbolva00 wrote:
> Regression
We've managed to change the order of load combines and we prematurely end up with different element sizes which we then can't combine together later. This is an existing problem, we've just exposed it more.

================
Comment at: llvm/test/CodeGen/X86/shuffle-vs-trunc-512.ll:559
 ; AVX512VBMI:       # %bb.0:
-; AVX512VBMI-NEXT:    vmovdqa {{.*#+}} xmm1 = [1,5,9,13,17,21,25,29,33,37,41,45,49,53,57,61]
-; AVX512VBMI-NEXT:    vpermb %zmm0, %zmm1, %zmm0
-; AVX512VBMI-NEXT:    # kill: def $xmm0 killed $xmm0 killed $zmm0
+; AVX512VBMI-NEXT:    vpbroadcastq {{.*#+}} zmm1 = [5569058560453190945,5569058560453190945,5569058560453190945,5569058560453190945,5569058560453190945,5569058560453190945,5569058560453190945,5569058560453190945]
+; AVX512VBMI-NEXT:    vextracti64x4 $1, %zmm0, %ymm2
----------------
xbolva00 wrote:
> Worse
still looking at these regressions - but we've always been very weak at shuffle combining different src/dst vector widths.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D66004/new/

https://reviews.llvm.org/D66004