[all-commits] [llvm/llvm-project] b2bf01: [X86] X86FixupInstTuning - prefer VPBLENDD to VPBL...

Mon Jun 16 02:31:46 PDT 2025

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: b2bf017acd0369fff89b933cf7c653f62b49f8d3
      https://github.com/llvm/llvm-project/commit/b2bf017acd0369fff89b933cf7c653f62b49f8d3
  Author: Simon Pilgrim <llvm-dev at redking.me.uk>
  Date:   2025-06-16 (Mon, 16 Jun 2025)

  Changed paths:
    M llvm/lib/Target/X86/X86FixupInstTuning.cpp
    M llvm/test/CodeGen/X86/combine-or-shuffle.ll
    M llvm/test/CodeGen/X86/dpbusd.ll
    M llvm/test/CodeGen/X86/dpbusd_const.ll
    M llvm/test/CodeGen/X86/shuffle-vs-trunc-256.ll
    M llvm/test/CodeGen/X86/vector-reduce-add-mask.ll
    M llvm/test/CodeGen/X86/vector-reduce-add-zext.ll
    M llvm/test/CodeGen/X86/vector-reduce-add.ll
    M llvm/test/CodeGen/X86/zero_extend_vector_inreg.ll
    M llvm/test/CodeGen/X86/zero_extend_vector_inreg_of_broadcast.ll

  Log Message:
  -----------
  [X86] X86FixupInstTuning - prefer VPBLENDD to VPBLENDW shuffles on AVX2+ targets (#144269)

On many Intel AVX2 targets (Haswell+), VPBLENDD has notably better throughput than VPBLENDW - and the remaining Intel/AMD targets have no preference.

This patch replaces VPBLENDW shuffles if the shuffle mask can be safely widened from vXi16 to vXi32 and that the scheduler model doesn't consider it a regression (I haven't found any target where this is true, but we should retain the model check).

Noticed while working on #142972 where VMOVSS nodes were regressing to VPBLENDW nodes during domain switching.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications