[all-commits] [llvm/llvm-project] fdbc30: [X86][DAGCombiner][SelectionDAG] - Fold Zext Build...

Tue May 6 00:25:32 PDT 2025

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: fdbc30a383973d89d738283e733ba0db98df6a77
      https://github.com/llvm/llvm-project/commit/fdbc30a383973d89d738283e733ba0db98df6a77
  Author: Rohit Aggarwal <44664450+rohitaggarwal007 at users.noreply.github.com>
  Date:   2025-05-06 (Tue, 06 May 2025)

  Changed paths:
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/test/CodeGen/X86/buildvec-widen-dotproduct.ll

  Log Message:
  -----------
  [X86][DAGCombiner][SelectionDAG] - Fold Zext Build Vector to Bitcast of widen Build Vector (#135010)

I am working on a problem in which a kernel is SLP vectorized and lead
to generation of insertelements followed by zext. On lowering, the
assembly looks like below:

    vmovd   %r9d, %xmm0
    vpinsrb $1, (%rdi,%rsi), %xmm0, %xmm0
    vpinsrb $2, (%rdi,%rsi,2), %xmm0, %xmm0
    vpinsrb $3, (%rdi,%rcx), %xmm0, %xmm0
    vpinsrb $4, (%rdi,%rsi,4), %xmm0, %xmm0
    vpinsrb $5, (%rdi,%rax), %xmm0, %xmm0
    vpinsrb $6, (%rdi,%rcx,2), %xmm0, %xmm0
    vpinsrb $7, (%rdi,%r8), %xmm0, %xmm0
vpmovzxbw %xmm0, %xmm0 # xmm0 =
xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
    vpmaddwd        (%rdx), %xmm0, %xmm0
After all the insrb, xmm0 looks like

xmm0=xmm0[0],xmm0[1],xmm0[2],xmm0[3],xmm0[4],xmm0[5],xmm0[6],xmm0[7],zero,zero,zero,zero,zero,zero,zero,zero
Here vpmovzxbw perform the extension of i8 to i16. But it is expensive
operation and I want to remove it.

Optimization
Place the value in correct location while inserting so that zext can be
avoid.
While lowering, we can write a custom lowerOperation for
zero_extend_vector_inreg opcode. We can override the current default
operation with my custom in the legalization step.

The changes proposed are state below:

    vpinsrb $2, (%rdi,%rsi), %xmm0, %xmm0
    vpinsrb $4, (%rdi,%rsi,2), %xmm0, %xmm0
    vpinsrb $6, (%rdi,%rcx), %xmm0, %xmm0
    vpinsrb $8, (%rdi,%rsi,4), %xmm0, %xmm0
    vpinsrb $a, (%rdi,%rax), %xmm0, %xmm0
    vpinsrb $c, (%rdi,%rcx,2), %xmm0, %xmm0
vpinsrb $e, (%rdi,%r8), %xmm0, %xmm0 # xmm0 =
xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
    vpmaddwd        (%rdx), %xmm0, %xmm0

More details in the discourse topic
[https://discourse.llvm.org/t/improve-the-gathering-of-the-elements-so-that-unwanted-ext-operations-can-be-avoided/85443](url)

---------

Co-authored-by: Rohit Aggarwal <Rohit.Aggarwal at amd.com>

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications