[all-commits] [llvm/llvm-project] 834cc8: [X86] X86FixupVectorConstantsPass - attempt to rep...

Tue Jun 13 04:10:39 PDT 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 834cc88c5d08ca55664b7742590463de813d768f
      https://github.com/llvm/llvm-project/commit/834cc88c5d08ca55664b7742590463de813d768f
  Author: Simon Pilgrim <llvm-dev at redking.me.uk>
  Date:   2023-06-13 (Tue, 13 Jun 2023)

  Changed paths:
    M llvm/lib/Target/X86/X86FixupVectorConstants.cpp
    M llvm/test/CodeGen/X86/avx-basic.ll
    M llvm/test/CodeGen/X86/avx-vbroadcast.ll
    M llvm/test/CodeGen/X86/avx2-conversions.ll
    M llvm/test/CodeGen/X86/avx2-intrinsics-x86.ll
    M llvm/test/CodeGen/X86/avx2-vbroadcast.ll
    M llvm/test/CodeGen/X86/avx512-regcall-Mask.ll
    M llvm/test/CodeGen/X86/avx512-shuffles/partial_permute.ll
    M llvm/test/CodeGen/X86/bitreverse.ll
    M llvm/test/CodeGen/X86/broadcast-elm-cross-splat-vec.ll
    M llvm/test/CodeGen/X86/cast-vsel.ll
    M llvm/test/CodeGen/X86/combine-and.ll
    M llvm/test/CodeGen/X86/combine-sdiv.ll
    M llvm/test/CodeGen/X86/combine-udiv.ll
    M llvm/test/CodeGen/X86/extractelement-load.ll
    M llvm/test/CodeGen/X86/fma-fneg-combine-2.ll
    M llvm/test/CodeGen/X86/fma-intrinsics-fast-isel.ll
    M llvm/test/CodeGen/X86/fma_patterns.ll
    M llvm/test/CodeGen/X86/fma_patterns_wide.ll
    M llvm/test/CodeGen/X86/fminimum-fmaximum.ll
    M llvm/test/CodeGen/X86/fold-vector-sext-zext.ll
    M llvm/test/CodeGen/X86/fold-vector-trunc-sitofp.ll
    M llvm/test/CodeGen/X86/fp-round.ll
    M llvm/test/CodeGen/X86/insert-into-constant-vector.ll
    M llvm/test/CodeGen/X86/known-bits-vector.ll
    M llvm/test/CodeGen/X86/masked_store_trunc.ll
    M llvm/test/CodeGen/X86/masked_store_trunc_usat.ll
    M llvm/test/CodeGen/X86/memset-nonzero.ll
    M llvm/test/CodeGen/X86/merge-store-constants.ll
    M llvm/test/CodeGen/X86/oddshuffles.ll
    M llvm/test/CodeGen/X86/paddus.ll
    M llvm/test/CodeGen/X86/pr30290.ll
    M llvm/test/CodeGen/X86/pr32368.ll
    M llvm/test/CodeGen/X86/pr38639.ll
    M llvm/test/CodeGen/X86/psubus.ll
    M llvm/test/CodeGen/X86/recip-fastmath.ll
    M llvm/test/CodeGen/X86/recip-fastmath2.ll
    M llvm/test/CodeGen/X86/sadd_sat_vec.ll
    M llvm/test/CodeGen/X86/sat-add.ll
    M llvm/test/CodeGen/X86/shuffle-vs-trunc-256.ll
    M llvm/test/CodeGen/X86/splat-const.ll
    M llvm/test/CodeGen/X86/sqrt-fastmath-tune.ll
    M llvm/test/CodeGen/X86/sqrt-fastmath.ll
    M llvm/test/CodeGen/X86/srem-seteq-vec-splat.ll
    M llvm/test/CodeGen/X86/sse2.ll
    M llvm/test/CodeGen/X86/sshl_sat_vec.ll
    M llvm/test/CodeGen/X86/ssub_sat_vec.ll
    M llvm/test/CodeGen/X86/urem-seteq-vec-splat.ll
    M llvm/test/CodeGen/X86/v8i1-masks.ll
    M llvm/test/CodeGen/X86/vec-strict-fptoint-128.ll
    M llvm/test/CodeGen/X86/vec-strict-fptoint-256.ll
    M llvm/test/CodeGen/X86/vec_anyext.ll
    M llvm/test/CodeGen/X86/vec_fabs.ll
    M llvm/test/CodeGen/X86/vec_fp_to_int.ll
    M llvm/test/CodeGen/X86/vec_int_to_fp.ll
    M llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics.ll
    M llvm/test/CodeGen/X86/vector-fshl-256.ll
    M llvm/test/CodeGen/X86/vector-fshr-256.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-3.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-4.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-6.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-7.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-3.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-5.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-6.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-7.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-8.ll
    M llvm/test/CodeGen/X86/vector-reduce-add-mask.ll
    M llvm/test/CodeGen/X86/vector-reduce-xor-bool.ll
    M llvm/test/CodeGen/X86/vector-shuffle-256-v32.ll
    M llvm/test/CodeGen/X86/vector-shuffle-256-v8.ll
    M llvm/test/CodeGen/X86/vector-shuffle-avx512.ll
    M llvm/test/CodeGen/X86/vector-shuffle-combining-avx.ll
    M llvm/test/CodeGen/X86/vector-shuffle-combining.ll
    M llvm/test/CodeGen/X86/vector-trunc-math.ll
    M llvm/test/CodeGen/X86/vector-trunc-ssat.ll
    M llvm/test/CodeGen/X86/vector-trunc-usat.ll
    M llvm/test/CodeGen/X86/vector-trunc.ll
    M llvm/test/CodeGen/X86/vselect-avx.ll
    M llvm/test/CodeGen/X86/vselect-zero.ll
    M llvm/test/CodeGen/X86/win_cst_pool.ll

  Log Message:
  -----------
  [X86] X86FixupVectorConstantsPass - attempt to replace full width fp vector constant loads with broadcasts on AVX+ targets (REAPPLIED)

lowerBuildVectorAsBroadcast will not broadcast splat constants in all cases, resulting in a lot of situations where a full width vector load that has failed to fold but is loading splat constant values could use a broadcast load instruction just as cheaply, and save constant pool space.

NOTE: SSE3 targets can use MOVDDUP but not all SSE era CPUs can perform this as cheaply as a vector load, we will need to add scheduler model checks if we want to pursue this.

This is an updated commit of 98061013e01207444cfd3980cde17b5e75764fbe after being reverted at a279a09ab9524d1d74ef29b34618102d4b202e2f