[all-commits] [llvm/llvm-project] 68ca84: DAG: Handle load in SimplifyDemandedVectorElts

Matt Arsenault via All-commits all-commits at lists.llvm.org
Sun Jan 12 23:12:41 PST 2025


  Branch: refs/heads/users/arsenm/dag/simplify-demanded-vector-elts-load
  Home:   https://github.com/llvm/llvm-project
  Commit: 68ca84a39ddd9f1e08ee553be4220dc4c2050e99
      https://github.com/llvm/llvm-project/commit/68ca84a39ddd9f1e08ee553be4220dc4c2050e99
  Author: Matt Arsenault <Matthew.Arsenault at amd.com>
  Date:   2025-01-13 (Mon, 13 Jan 2025)

  Changed paths:
    M llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
    M llvm/test/CodeGen/AArch64/arm64-big-endian-bitconverts.ll
    M llvm/test/CodeGen/AArch64/dag-ReplaceAllUsesOfValuesWith.ll
    M llvm/test/CodeGen/AArch64/fcmp.ll
    M llvm/test/CodeGen/AArch64/fmlal-loreg.ll
    M llvm/test/CodeGen/AArch64/icmp.ll
    M llvm/test/CodeGen/AArch64/sve-fixed-length-extract-vector-elt.ll
    M llvm/test/CodeGen/AArch64/sve-fixed-length-masked-gather.ll
    M llvm/test/CodeGen/AArch64/sve-fixed-length-masked-scatter.ll
    M llvm/test/CodeGen/AArch64/sve-streaming-mode-fixed-length-extract-vector-elt.ll
    M llvm/test/CodeGen/AMDGPU/fcopysign.f32.ll
    M llvm/test/CodeGen/AMDGPU/fcopysign.f64.ll
    M llvm/test/CodeGen/AMDGPU/greedy-reverse-local-assignment.ll
    M llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll
    M llvm/test/CodeGen/AMDGPU/implicit-kernarg-backend-usage.ll
    M llvm/test/CodeGen/AMDGPU/shader-addr64-nonuniform.ll
    M llvm/test/CodeGen/AMDGPU/trunc.ll
    M llvm/test/CodeGen/AMDGPU/vector_rebroadcast.ll
    M llvm/test/CodeGen/AMDGPU/vector_shuffle.packed.ll
    M llvm/test/CodeGen/ARM/crash-on-pow2-shufflevector.ll
    M llvm/test/CodeGen/ARM/vector-promotion.ll
    M llvm/test/CodeGen/ARM/vext.ll
    M llvm/test/CodeGen/ARM/vuzp.ll
    M llvm/test/CodeGen/LoongArch/vector-fp-imm.ll
    M llvm/test/CodeGen/Mips/cconv/vector.ll
    M llvm/test/CodeGen/Mips/msa/basic_operations.ll
    M llvm/test/CodeGen/NVPTX/i128.ll
    M llvm/test/CodeGen/NVPTX/i8x4-instructions.ll
    M llvm/test/CodeGen/NVPTX/store-undef.ll
    M llvm/test/CodeGen/PowerPC/aix-vector-byval-callee.ll
    M llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll
    M llvm/test/CodeGen/PowerPC/const-stov.ll
    M llvm/test/CodeGen/PowerPC/pr27078.ll
    M llvm/test/CodeGen/PowerPC/pre-inc-disable.ll
    M llvm/test/CodeGen/PowerPC/v16i8_scalar_to_vector_shuffle.ll
    M llvm/test/CodeGen/PowerPC/v2i64_scalar_to_vector_shuffle.ll
    M llvm/test/CodeGen/PowerPC/v8i16_scalar_to_vector_shuffle.ll
    M llvm/test/CodeGen/PowerPC/vsx_shuffle_le.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-shuffles.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll
    M llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-int.ll
    M llvm/test/CodeGen/Thumb2/mve-extractstore.ll
    M llvm/test/CodeGen/Thumb2/mve-insertshuffleload.ll
    M llvm/test/CodeGen/X86/SwizzleShuff.ll
    M llvm/test/CodeGen/X86/avx-vbroadcast.ll
    M llvm/test/CodeGen/X86/avx.ll
    M llvm/test/CodeGen/X86/avx1-logical-load-folding.ll
    M llvm/test/CodeGen/X86/avx512-arith.ll
    M llvm/test/CodeGen/X86/avx512-broadcast-arith.ll
    M llvm/test/CodeGen/X86/avx512-broadcast-unfold.ll
    M llvm/test/CodeGen/X86/avx512-calling-conv.ll
    M llvm/test/CodeGen/X86/avx512-cmp.ll
    M llvm/test/CodeGen/X86/avx512-ext.ll
    M llvm/test/CodeGen/X86/avx512-extract-subvector-load-store.ll
    M llvm/test/CodeGen/X86/avx512-intrinsics-fast-isel.ll
    M llvm/test/CodeGen/X86/avx512-intrinsics.ll
    M llvm/test/CodeGen/X86/avx512-load-store.ll
    M llvm/test/CodeGen/X86/avx512-logic.ll
    M llvm/test/CodeGen/X86/avx512-select.ll
    M llvm/test/CodeGen/X86/avx512-shuffles/partial_permute.ll
    M llvm/test/CodeGen/X86/avx512-shuffles/shuffle-interleave.ll
    M llvm/test/CodeGen/X86/avx512-shuffles/unpack.ll
    M llvm/test/CodeGen/X86/avx512fp16-mov.ll
    M llvm/test/CodeGen/X86/bitreverse.ll
    M llvm/test/CodeGen/X86/buildvec-insertvec.ll
    M llvm/test/CodeGen/X86/combine-fabs.ll
    M llvm/test/CodeGen/X86/combine-sdiv.ll
    M llvm/test/CodeGen/X86/combine-udiv.ll
    M llvm/test/CodeGen/X86/commute-blend-avx2.ll
    M llvm/test/CodeGen/X86/commute-blend-sse41.ll
    M llvm/test/CodeGen/X86/copysign-constant-magnitude.ll
    M llvm/test/CodeGen/X86/extract-concat.ll
    M llvm/test/CodeGen/X86/extractelement-fp.ll
    M llvm/test/CodeGen/X86/extractelement-load.ll
    M llvm/test/CodeGen/X86/fabs.ll
    M llvm/test/CodeGen/X86/fast-isel-fneg.ll
    M llvm/test/CodeGen/X86/fma-signed-zero.ll
    M llvm/test/CodeGen/X86/fp-fold.ll
    M llvm/test/CodeGen/X86/fp-intrinsics-fma.ll
    M llvm/test/CodeGen/X86/fp-logic.ll
    M llvm/test/CodeGen/X86/fp-round.ll
    M llvm/test/CodeGen/X86/fp128-cast.ll
    M llvm/test/CodeGen/X86/fp16-libcalls.ll
    M llvm/test/CodeGen/X86/freeze-vector.ll
    M llvm/test/CodeGen/X86/gfni-funnel-shifts.ll
    M llvm/test/CodeGen/X86/half.ll
    M llvm/test/CodeGen/X86/insert-into-constant-vector.ll
    M llvm/test/CodeGen/X86/insertps-combine.ll
    M llvm/test/CodeGen/X86/insertps-from-constantpool.ll
    M llvm/test/CodeGen/X86/insertps-unfold-load-bug.ll
    M llvm/test/CodeGen/X86/is_fpclass.ll
    M llvm/test/CodeGen/X86/isel-blendi-gettargetconstant.ll
    M llvm/test/CodeGen/X86/load-partial.ll
    M llvm/test/CodeGen/X86/masked_load.ll
    M llvm/test/CodeGen/X86/masked_store.ll
    M llvm/test/CodeGen/X86/mmx-arith.ll
    M llvm/test/CodeGen/X86/neg_fp.ll
    M llvm/test/CodeGen/X86/negative-sin.ll
    M llvm/test/CodeGen/X86/packus.ll
    M llvm/test/CodeGen/X86/peephole-fold-movsd.ll
    M llvm/test/CodeGen/X86/pr14161.ll
    M llvm/test/CodeGen/X86/pr30511.ll
    M llvm/test/CodeGen/X86/pr31956.ll
    M llvm/test/CodeGen/X86/pr34592.ll
    M llvm/test/CodeGen/X86/pr36553.ll
    M llvm/test/CodeGen/X86/pr40811.ll
    M llvm/test/CodeGen/X86/pr63091.ll
    M llvm/test/CodeGen/X86/sar_fold64.ll
    M llvm/test/CodeGen/X86/setcc-combine.ll
    M llvm/test/CodeGen/X86/setcc-non-simple-type.ll
    M llvm/test/CodeGen/X86/shrink_vmul.ll
    M llvm/test/CodeGen/X86/shuffle-vs-trunc-512.ll
    M llvm/test/CodeGen/X86/splat-for-size.ll
    M llvm/test/CodeGen/X86/sqrt-fastmath-tune.ll
    M llvm/test/CodeGen/X86/sqrt-fastmath-tunecpu-attr.ll
    M llvm/test/CodeGen/X86/sqrt-fastmath.ll
    M llvm/test/CodeGen/X86/srem-seteq-vec-nonsplat.ll
    M llvm/test/CodeGen/X86/sse-align-12.ll
    M llvm/test/CodeGen/X86/sse2.ll
    M llvm/test/CodeGen/X86/sse3.ll
    M llvm/test/CodeGen/X86/sse41.ll
    M llvm/test/CodeGen/X86/strict-fsub-combines.ll
    M llvm/test/CodeGen/X86/subvector-broadcast.ll
    M llvm/test/CodeGen/X86/test-shrink-bug.ll
    M llvm/test/CodeGen/X86/tuning-shuffle-unpckpd-avx512.ll
    M llvm/test/CodeGen/X86/tuning-shuffle-unpckpd.ll
    M llvm/test/CodeGen/X86/urem-seteq-vec-tautological.ll
    M llvm/test/CodeGen/X86/vec_insert-5.ll
    M llvm/test/CodeGen/X86/vec_int_to_fp.ll
    M llvm/test/CodeGen/X86/vec_shift5.ll
    M llvm/test/CodeGen/X86/vec_umulo.ll
    M llvm/test/CodeGen/X86/vector-bitreverse.ll
    M llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics-flags.ll
    M llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics.ll
    M llvm/test/CodeGen/X86/vector-fshl-256.ll
    M llvm/test/CodeGen/X86/vector-fshl-512.ll
    M llvm/test/CodeGen/X86/vector-fshr-256.ll
    M llvm/test/CodeGen/X86/vector-fshr-512.ll
    M llvm/test/CodeGen/X86/vector-idiv-sdiv-512.ll
    M llvm/test/CodeGen/X86/vector-reduce-fmax-nnan.ll
    M llvm/test/CodeGen/X86/vector-reduce-fmin.ll
    M llvm/test/CodeGen/X86/vector-rotate-128.ll
    M llvm/test/CodeGen/X86/vector-rotate-256.ll
    M llvm/test/CodeGen/X86/vector-rotate-512.ll
    M llvm/test/CodeGen/X86/vector-shift-ashr-128.ll
    M llvm/test/CodeGen/X86/vector-shift-ashr-256.ll
    M llvm/test/CodeGen/X86/vector-shift-ashr-512.ll
    M llvm/test/CodeGen/X86/vector-shift-lshr-128.ll
    M llvm/test/CodeGen/X86/vector-shift-lshr-256.ll
    M llvm/test/CodeGen/X86/vector-shift-lshr-512.ll
    M llvm/test/CodeGen/X86/vector-shift-shl-128.ll
    M llvm/test/CodeGen/X86/vector-shift-shl-256.ll
    M llvm/test/CodeGen/X86/vector-shift-shl-512.ll
    M llvm/test/CodeGen/X86/vector-shuffle-128-v2.ll
    M llvm/test/CodeGen/X86/vector-shuffle-128-v4.ll
    M llvm/test/CodeGen/X86/vector-shuffle-combining-avx2.ll
    M llvm/test/CodeGen/X86/vector-shuffle-combining-ssse3.ll
    M llvm/test/CodeGen/X86/vector-shuffle-combining.ll
    M llvm/test/CodeGen/X86/vector-shuffle-v1.ll
    M llvm/test/CodeGen/X86/vector-shuffle-v192.ll
    M llvm/test/CodeGen/X86/vector-shuffle-v48.ll
    M llvm/test/CodeGen/X86/vselect.ll
    M llvm/test/CodeGen/X86/widened-broadcast.ll
    M llvm/test/CodeGen/X86/x86-interleaved-access.ll
    M llvm/test/CodeGen/X86/xop-shifts.ll
    M llvm/test/CodeGen/X86/xor.ll

  Log Message:
  -----------
  DAG: Handle load in SimplifyDemandedVectorElts

This improves some AMDGPU cases and avoids future regressions.
The combiner likes to form shuffles for cases where an extract_vector_elt
would do perfectly well, and this recovers some of the regressions from
losing load narrowing.

AMDGPU, Arch64 and RISCV test changes look broadly better. Other targets have
some improvements, but mostly regressions. In particular X86 looks much
worse. I'm guessing this is because it's shouldReduceLoadWidth is wrong.

I mostly just regenerated the checks. I assume some set of them should
switch to use volatile loads to defeat the optimization.



To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications


More information about the All-commits mailing list