[all-commits] [llvm/llvm-project] 426f5f: DAG: Handle load in SimplifyDemandedVectorElts

Sun Jan 12 23:10:05 PST 2025

  Branch: refs/heads/users/arsenm/dag/simplify-demanded-vector-elts-load
  Home:   https://github.com/llvm/llvm-project
  Commit: 426f5fa7ce301611c1c53936d8dc6f6b2b976a91
      https://github.com/llvm/llvm-project/commit/426f5fa7ce301611c1c53936d8dc6f6b2b976a91
  Author: Matt Arsenault <Matthew.Arsenault at amd.com>
  Date:   2025-01-13 (Mon, 13 Jan 2025)

  Changed paths:
    M llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
    M llvm/test/CodeGen/AArch64/arm64-big-endian-bitconverts.ll
    M llvm/test/CodeGen/AArch64/dag-ReplaceAllUsesOfValuesWith.ll
    M llvm/test/CodeGen/AArch64/fcmp.ll
    M llvm/test/CodeGen/AArch64/fmlal-loreg.ll
    M llvm/test/CodeGen/AArch64/icmp.ll
    M llvm/test/CodeGen/AArch64/sve-fixed-length-extract-vector-elt.ll
    M llvm/test/CodeGen/AArch64/sve-fixed-length-masked-gather.ll
    M llvm/test/CodeGen/AArch64/sve-fixed-length-masked-scatter.ll
    M llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-extract-vector-elt.ll
    M llvm/test/CodeGen/AMDGPU/fcopysign.f32.ll
    M llvm/test/CodeGen/AMDGPU/fcopysign.f64.ll
    M llvm/test/CodeGen/AMDGPU/greedy-reverse-local-assignment.ll
    M llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll
    M llvm/test/CodeGen/AMDGPU/implicit-kernarg-backend-usage.ll
    M llvm/test/CodeGen/AMDGPU/shader-addr64-nonuniform.ll
    M llvm/test/CodeGen/AMDGPU/trunc.ll
    M llvm/test/CodeGen/AMDGPU/vector_rebroadcast.ll
    M llvm/test/CodeGen/AMDGPU/vector_shuffle.packed.ll
    M llvm/test/CodeGen/ARM/crash-on-pow2-shufflevector.ll
    M llvm/test/CodeGen/ARM/vector-promotion.ll
    M llvm/test/CodeGen/ARM/vext.ll
    M llvm/test/CodeGen/ARM/vuzp.ll
    M llvm/test/CodeGen/LoongArch/vector-fp-imm.ll
    M llvm/test/CodeGen/Mips/cconv/vector.ll
    M llvm/test/CodeGen/Mips/msa/basic_operations.ll
    M llvm/test/CodeGen/NVPTX/i128.ll
    M llvm/test/CodeGen/NVPTX/i8x4-instructions.ll
    M llvm/test/CodeGen/NVPTX/store-undef.ll
    M llvm/test/CodeGen/PowerPC/aix-vector-byval-callee.ll
    M llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll
    M llvm/test/CodeGen/PowerPC/const-stov.ll
    M llvm/test/CodeGen/PowerPC/pr27078.ll
    M llvm/test/CodeGen/PowerPC/pre-inc-disable.ll
    M llvm/test/CodeGen/PowerPC/v16i8_scalar_to_vector_shuffle.ll
    M llvm/test/CodeGen/PowerPC/v2i64_scalar_to_vector_shuffle.ll
    M llvm/test/CodeGen/PowerPC/v8i16_scalar_to_vector_shuffle.ll
    M llvm/test/CodeGen/PowerPC/vsx_shuffle_le.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-int.ll
    M llvm/test/CodeGen/Thumb2/mve-extractstore.ll
    M llvm/test/CodeGen/Thumb2/mve-insertshuffleload.ll
    M llvm/test/CodeGen/X86/SwizzleShuff.ll
    M llvm/test/CodeGen/X86/avx-vbroadcast.ll
    M llvm/test/CodeGen/X86/avx.ll
    M llvm/test/CodeGen/X86/avx1-logical-load-folding.ll
    M llvm/test/CodeGen/X86/avx512-arith.ll
    M llvm/test/CodeGen/X86/avx512-broadcast-arith.ll
    M llvm/test/CodeGen/X86/avx512-broadcast-unfold.ll
    M llvm/test/CodeGen/X86/avx512-calling-conv.ll
    M llvm/test/CodeGen/X86/avx512-cmp.ll
    M llvm/test/CodeGen/X86/avx512-ext.ll
    M llvm/test/CodeGen/X86/avx512-extract-subvector-load-store.ll
    M llvm/test/CodeGen/X86/avx512-intrinsics-fast-isel.ll
    M llvm/test/CodeGen/X86/avx512-intrinsics.ll
    M llvm/test/CodeGen/X86/avx512-load-store.ll
    M llvm/test/CodeGen/X86/avx512-logic.ll
    M llvm/test/CodeGen/X86/avx512-select.ll
    M llvm/test/CodeGen/X86/avx512-shuffles/partial_permute.ll
    M llvm/test/CodeGen/X86/avx512-shuffles/shuffle-interleave.ll
    M llvm/test/CodeGen/X86/avx512-shuffles/unpack.ll
    M llvm/test/CodeGen/X86/avx512fp16-mov.ll
    M llvm/test/CodeGen/X86/bitreverse.ll
    M llvm/test/CodeGen/X86/buildvec-insertvec.ll
    M llvm/test/CodeGen/X86/combine-fabs.ll
    M llvm/test/CodeGen/X86/combine-sdiv.ll
    M llvm/test/CodeGen/X86/combine-udiv.ll
    M llvm/test/CodeGen/X86/commute-blend-avx2.ll
    M llvm/test/CodeGen/X86/commute-blend-sse41.ll
    M llvm/test/CodeGen/X86/copysign-constant-magnitude.ll
    M llvm/test/CodeGen/X86/extract-concat.ll
    M llvm/test/CodeGen/X86/extractelement-fp.ll
    M llvm/test/CodeGen/X86/extractelement-load.ll
    M llvm/test/CodeGen/X86/fabs.ll
    M llvm/test/CodeGen/X86/fast-isel-fneg.ll
    M llvm/test/CodeGen/X86/fma-signed-zero.ll
    M llvm/test/CodeGen/X86/fp-fold.ll
    M llvm/test/CodeGen/X86/fp-intrinsics-fma.ll
    M llvm/test/CodeGen/X86/fp-logic.ll
    M llvm/test/CodeGen/X86/fp-round.ll
    M llvm/test/CodeGen/X86/fp128-cast.ll
    M llvm/test/CodeGen/X86/fp16-libcalls.ll
    M llvm/test/CodeGen/X86/freeze-vector.ll
    M llvm/test/CodeGen/X86/gfni-funnel-shifts.ll
    M llvm/test/CodeGen/X86/half.ll
    M llvm/test/CodeGen/X86/insert-into-constant-vector.ll
    M llvm/test/CodeGen/X86/insertps-combine.ll
    M llvm/test/CodeGen/X86/insertps-from-constantpool.ll
    M llvm/test/CodeGen/X86/insertps-unfold-load-bug.ll
    M llvm/test/CodeGen/X86/is_fpclass.ll
    M llvm/test/CodeGen/X86/isel-blendi-gettargetconstant.ll
    M llvm/test/CodeGen/X86/load-partial.ll
    M llvm/test/CodeGen/X86/masked_load.ll
    M llvm/test/CodeGen/X86/masked_store.ll
    M llvm/test/CodeGen/X86/mmx-arith.ll
    M llvm/test/CodeGen/X86/neg_fp.ll
    M llvm/test/CodeGen/X86/negative-sin.ll
    M llvm/test/CodeGen/X86/packus.ll
    M llvm/test/CodeGen/X86/peephole-fold-movsd.ll
    M llvm/test/CodeGen/X86/pr14161.ll
    M llvm/test/CodeGen/X86/pr30511.ll
    M llvm/test/CodeGen/X86/pr31956.ll
    M llvm/test/CodeGen/X86/pr34592.ll
    M llvm/test/CodeGen/X86/pr36553.ll
    M llvm/test/CodeGen/X86/pr40811.ll
    M llvm/test/CodeGen/X86/pr63091.ll
    M llvm/test/CodeGen/X86/sar_fold64.ll
    M llvm/test/CodeGen/X86/setcc-combine.ll
    M llvm/test/CodeGen/X86/setcc-non-simple-type.ll
    M llvm/test/CodeGen/X86/shrink_vmul.ll
    M llvm/test/CodeGen/X86/shuffle-vs-trunc-512.ll
    M llvm/test/CodeGen/X86/splat-for-size.ll
    M llvm/test/CodeGen/X86/sqrt-fastmath-tune.ll
    M llvm/test/CodeGen/X86/sqrt-fastmath-tunecpu-attr.ll
    M llvm/test/CodeGen/X86/sqrt-fastmath.ll
    M llvm/test/CodeGen/X86/srem-seteq-vec-nonsplat.ll
    M llvm/test/CodeGen/X86/sse-align-12.ll
    M llvm/test/CodeGen/X86/sse2.ll
    M llvm/test/CodeGen/X86/sse3.ll
    M llvm/test/CodeGen/X86/sse41.ll
    M llvm/test/CodeGen/X86/strict-fsub-combines.ll
    M llvm/test/CodeGen/X86/subvector-broadcast.ll
    M llvm/test/CodeGen/X86/test-shrink-bug.ll
    M llvm/test/CodeGen/X86/tuning-shuffle-unpckpd-avx512.ll
    M llvm/test/CodeGen/X86/tuning-shuffle-unpckpd.ll
    M llvm/test/CodeGen/X86/urem-seteq-vec-tautological.ll
    M llvm/test/CodeGen/X86/vec_insert-5.ll
    M llvm/test/CodeGen/X86/vec_int_to_fp.ll
    M llvm/test/CodeGen/X86/vec_shift5.ll
    M llvm/test/CodeGen/X86/vec_umulo.ll
    M llvm/test/CodeGen/X86/vector-bitreverse.ll
    M llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics-flags.ll
    M llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics.ll
    M llvm/test/CodeGen/X86/vector-fshl-256.ll
    M llvm/test/CodeGen/X86/vector-fshl-512.ll
    M llvm/test/CodeGen/X86/vector-fshr-256.ll
    M llvm/test/CodeGen/X86/vector-fshr-512.ll
    M llvm/test/CodeGen/X86/vector-idiv-sdiv-512.ll
    M llvm/test/CodeGen/X86/vector-reduce-fmax-nnan.ll
    M llvm/test/CodeGen/X86/vector-reduce-fmin.ll
    M llvm/test/CodeGen/X86/vector-rotate-128.ll
    M llvm/test/CodeGen/X86/vector-rotate-256.ll
    M llvm/test/CodeGen/X86/vector-rotate-512.ll
    M llvm/test/CodeGen/X86/vector-shift-ashr-128.ll
    M llvm/test/CodeGen/X86/vector-shift-ashr-256.ll
    M llvm/test/CodeGen/X86/vector-shift-ashr-512.ll
    M llvm/test/CodeGen/X86/vector-shift-lshr-128.ll
    M llvm/test/CodeGen/X86/vector-shift-lshr-256.ll
    M llvm/test/CodeGen/X86/vector-shift-lshr-512.ll
    M llvm/test/CodeGen/X86/vector-shift-shl-128.ll
    M llvm/test/CodeGen/X86/vector-shift-shl-256.ll
    M llvm/test/CodeGen/X86/vector-shift-shl-512.ll
    M llvm/test/CodeGen/X86/vector-shuffle-128-v2.ll
    M llvm/test/CodeGen/X86/vector-shuffle-128-v4.ll
    M llvm/test/CodeGen/X86/vector-shuffle-combining-avx2.ll
    M llvm/test/CodeGen/X86/vector-shuffle-combining-ssse3.ll
    M llvm/test/CodeGen/X86/vector-shuffle-combining.ll
    M llvm/test/CodeGen/X86/vector-shuffle-v1.ll
    M llvm/test/CodeGen/X86/vector-shuffle-v192.ll
    M llvm/test/CodeGen/X86/vector-shuffle-v48.ll
    M llvm/test/CodeGen/X86/vselect.ll
    M llvm/test/CodeGen/X86/widened-broadcast.ll
    M llvm/test/CodeGen/X86/x86-interleaved-access.ll
    M llvm/test/CodeGen/X86/xop-shifts.ll
    M llvm/test/CodeGen/X86/xor.ll

  Log Message:
  -----------
  DAG: Handle load in SimplifyDemandedVectorElts

This improves some AMDGPU cases and avoids future regressions.
The combiner likes to form shuffles for cases where an extract_vector_elt
would do perfectly well, and this recovers some of the regressions from
losing load narrowing.

AMDGPU, Arch64 and RISCV test changes look broadly better. Other targets have
some improvements, but mostly regressions. In particular X86 looks much
worse. I'm guessing this is because it's shouldReduceLoadWidth is wrong.

I mostly just regenerated the checks. I assume some set of them should
switch to use volatile loads to defeat the optimization.

To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications