[PATCH] D80013: [x86] favor vector constant load to avoid GPR to XMM transfer

Fri May 15 08:39:37 PDT 2020

spatel created this revision.
spatel added reviewers: craig.topper, RKSimon.
Herald added subscribers: hiraditya, mcrosier.
Herald added a project: LLVM.
spatel marked 5 inline comments as done.
spatel added inline comments.

================
Comment at: llvm/test/CodeGen/X86/combine-udiv.ll:602
 ; XOP:       # %bb.0:
 ; XOP-NEXT:    movl $65535, %eax # imm = 0xFFFF
 ; XOP-NEXT:    vmovd %eax, %xmm1
----------------
This would improve without the -1 restriction.

================
Comment at: llvm/test/CodeGen/X86/combine-udiv.ll:681-682
+; AVX2:       # %bb.0:
+; AVX2-NEXT:    movl $171, %eax
+; AVX2-NEXT:    vmovd %eax, %xmm1
+; AVX2-NEXT:    vpmovzxbw {{.*#+}} xmm2 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
----------------
No change for AVX2 is probably caused by the 128-bit limit.

================
Comment at: llvm/test/CodeGen/X86/sad.ll:547-548
 ; SSE2-NEXT:    movq $-1024, %rax # imm = 0xFC00
 ; SSE2-NEXT:    movl $65535, %ecx # imm = 0xFFFF
 ; SSE2-NEXT:    movd %ecx, %xmm1
 ; SSE2-NEXT:    .p2align 4, 0x90
----------------
This would improve without the -1 restriction.

================
Comment at: llvm/test/CodeGen/X86/sad.ll:1019-1020
+; AVX2-NEXT:    vpsadbw (%rcx), %xmm1, %xmm1
+; AVX2-NEXT:    movl $1, %eax
+; AVX2-NEXT:    vmovd %eax, %xmm2
+; AVX2-NEXT:    vpaddd %xmm2, %xmm1, %xmm1
----------------
No change for AVX2/AXV512 is probably caused by the 128-bit limit.

================
Comment at: llvm/test/CodeGen/X86/vec_shift2.ll:13
 ; X64:       # %bb.0:
-; X64-NEXT:    psrlw $14, %xmm0
+; X64-NEXT:    psrlw {{.*}}(%rip), %xmm0
 ; X64-NEXT:    retq
----------------
This is a regression, but I'm assuming it does not matter because we have been using standard IR for vector shifts for at least 5 years. If it does matter, then I think the next test shows an existing failure of constant analysis. Also, if the high part of the shift amount is undef, then can't we fold both of these tests to constant 0 (no shift needed)?

This build vector lowering pattern came up in D79886 <https://reviews.llvm.org/D79886>. I've tried to limit the improvement to cases where it looks clearly better to load, but we could remove the 'TODO' predicates already if we are willing to overlook some corner cases.

https://reviews.llvm.org/D80013

Files:
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/test/CodeGen/X86/combine-udiv.ll
  llvm/test/CodeGen/X86/packss.ll
  llvm/test/CodeGen/X86/pshufb-mask-comments.ll
  llvm/test/CodeGen/X86/ret-mmx.ll
  llvm/test/CodeGen/X86/sad.ll
  llvm/test/CodeGen/X86/srem-seteq-vec-nonsplat.ll
  llvm/test/CodeGen/X86/urem-seteq-vec-nonsplat.ll
  llvm/test/CodeGen/X86/vec_set-A.ll
  llvm/test/CodeGen/X86/vec_shift2.ll
  llvm/test/CodeGen/X86/vector-lzcnt-128.ll
  llvm/test/CodeGen/X86/vector-shuffle-256-v32.ll
  llvm/test/CodeGen/X86/vector-tzcnt-128.ll
  llvm/test/CodeGen/X86/vmovq.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D80013.264252.patch
Type: text/x-patch
Size: 44669 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200515/aeb841b0/attachment-0001.bin>