[PATCH] D66004: [WIP][X86][SSE] SimplifyDemandedVectorEltsForTargetNode - add general shuffle combining support
Simon Pilgrim via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Jun 22 02:06:35 PDT 2020
RKSimon added a comment.
cheers @xbolva00 for what its worth, this is still a WIP, and SimplifyDemanded improvements do have a tendency to expose existing limitations (as well as find some awesome improvements.....)
================
Comment at: llvm/test/CodeGen/X86/avx512bwvl-intrinsics-upgrade.ll:5637
; X64-NEXT: kmovw %k0, %ecx # encoding: [0xc5,0xf8,0x93,0xc8]
; X64-NEXT: vmovd %ecx, %xmm0 # EVEX TO VEX Compression encoding: [0xc5,0xf9,0x6e,0xc1]
+; X64-NEXT: vpmovzxdq %xmm0, %xmm0 # EVEX TO VEX Compression encoding: [0xc4,0xe2,0x79,0x35,0xc0]
----------------
xbolva00 wrote:
> Regression
The original vzext_movl (which isel would consume into the movd) has been replaced with a zero_extend_vector_inreg.
================
Comment at: llvm/test/CodeGen/X86/buildvec-insertvec.ll:312
+; SSE41-NEXT: movd %eax, %xmm0
+; SSE41-NEXT: pmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero
+; SSE41-NEXT: pinsrw $3, %esi, %xmm0
----------------
xbolva00 wrote:
> Extra instructio (vs sse2)
The original vzext_movl (which isel would consume into the movd) has been replaced with a zero_extend_vector_inreg.
================
Comment at: llvm/test/CodeGen/X86/masked_expandload.ll:1302
+; SSE42-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
+; SSE42-NEXT: insertps {{.*#+}} xmm0 = xmm0[0,1],mem[0],xmm0[3]
+; SSE42-NEXT: insertps {{.*#+}} xmm0 = xmm0[0,1,2],mem[0]
----------------
xbolva00 wrote:
> Regression
We've managed to change the order of load combines and we prematurely end up with different element sizes which we then can't combine together later. This is an existing problem, we've just exposed it more.
================
Comment at: llvm/test/CodeGen/X86/shuffle-vs-trunc-512.ll:559
; AVX512VBMI: # %bb.0:
-; AVX512VBMI-NEXT: vmovdqa {{.*#+}} xmm1 = [1,5,9,13,17,21,25,29,33,37,41,45,49,53,57,61]
-; AVX512VBMI-NEXT: vpermb %zmm0, %zmm1, %zmm0
-; AVX512VBMI-NEXT: # kill: def $xmm0 killed $xmm0 killed $zmm0
+; AVX512VBMI-NEXT: vpbroadcastq {{.*#+}} zmm1 = [5569058560453190945,5569058560453190945,5569058560453190945,5569058560453190945,5569058560453190945,5569058560453190945,5569058560453190945,5569058560453190945]
+; AVX512VBMI-NEXT: vextracti64x4 $1, %zmm0, %ymm2
----------------
xbolva00 wrote:
> Worse
still looking at these regressions - but we've always been very weak at shuffle combining different src/dst vector widths.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D66004/new/
https://reviews.llvm.org/D66004
More information about the llvm-commits
mailing list