[all-commits] [llvm/llvm-project] e9f946: [X86] X86FixupInstTunings - add VPERMILPDri -> VSH...

Sun Apr 23 03:49:16 PDT 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: e9f9467da063875bd684e46660e2ff36ba4f55e2
      https://github.com/llvm/llvm-project/commit/e9f9467da063875bd684e46660e2ff36ba4f55e2
  Author: Simon Pilgrim <llvm-dev at redking.me.uk>
  Date:   2023-04-23 (Sun, 23 Apr 2023)

  Changed paths:
    M llvm/lib/Target/X86/X86FixupInstTuning.cpp
    M llvm/test/CodeGen/X86/avx-intrinsics-fast-isel.ll
    M llvm/test/CodeGen/X86/avx-intrinsics-x86-upgrade.ll
    M llvm/test/CodeGen/X86/avx-intrinsics-x86.ll
    M llvm/test/CodeGen/X86/avx-vbroadcast.ll
    M llvm/test/CodeGen/X86/avx512-cvt.ll
    M llvm/test/CodeGen/X86/avx512-hadd-hsub.ll
    M llvm/test/CodeGen/X86/avx512-intrinsics-fast-isel.ll
    M llvm/test/CodeGen/X86/avx512-intrinsics-upgrade.ll
    M llvm/test/CodeGen/X86/avx512-shuffles/in_lane_permute.ll
    M llvm/test/CodeGen/X86/avx512fp16-mov.ll
    M llvm/test/CodeGen/X86/avx512fp16-mscatter.ll
    M llvm/test/CodeGen/X86/avx512vl-intrinsics-upgrade.ll
    M llvm/test/CodeGen/X86/combine-and.ll
    M llvm/test/CodeGen/X86/complex-fastmath.ll
    M llvm/test/CodeGen/X86/copy-low-subvec-elt-to-high-subvec-elt.ll
    M llvm/test/CodeGen/X86/extract-concat.ll
    M llvm/test/CodeGen/X86/fmaddsub-combine.ll
    M llvm/test/CodeGen/X86/fmf-reduction.ll
    M llvm/test/CodeGen/X86/haddsub-2.ll
    M llvm/test/CodeGen/X86/haddsub-3.ll
    M llvm/test/CodeGen/X86/haddsub-broadcast.ll
    M llvm/test/CodeGen/X86/haddsub-shuf.ll
    M llvm/test/CodeGen/X86/haddsub-undef.ll
    M llvm/test/CodeGen/X86/haddsub.ll
    M llvm/test/CodeGen/X86/half.ll
    M llvm/test/CodeGen/X86/horizontal-reduce-fadd.ll
    M llvm/test/CodeGen/X86/horizontal-sum.ll
    M llvm/test/CodeGen/X86/known-signbits-vector.ll
    M llvm/test/CodeGen/X86/load-partial-dot-product.ll
    M llvm/test/CodeGen/X86/matrix-multiply.ll
    M llvm/test/CodeGen/X86/oddshuffles.ll
    M llvm/test/CodeGen/X86/pr40730.ll
    M llvm/test/CodeGen/X86/scalar-int-to-fp.ll
    M llvm/test/CodeGen/X86/scalarize-fp.ll
    M llvm/test/CodeGen/X86/shuffle-of-splat-multiuses.ll
    M llvm/test/CodeGen/X86/sse-scalar-fp-arith.ll
    M llvm/test/CodeGen/X86/sse2-intrinsics-fast-isel.ll
    M llvm/test/CodeGen/X86/sse3-avx-addsub-2.ll
    M llvm/test/CodeGen/X86/tuning-shuffle-permilpd-avx512.ll
    M llvm/test/CodeGen/X86/tuning-shuffle-permilpd.ll
    M llvm/test/CodeGen/X86/vec-strict-fptoint-128.ll
    M llvm/test/CodeGen/X86/vec-strict-fptoint-256.ll
    M llvm/test/CodeGen/X86/vec-strict-fptoint-512.ll
    M llvm/test/CodeGen/X86/vec_fp_to_int.ll
    M llvm/test/CodeGen/X86/vector-half-conversions.ll
    M llvm/test/CodeGen/X86/vector-interleave.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-5.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-3.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-4.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-5.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i64-stride-3.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i64-stride-7.ll
    M llvm/test/CodeGen/X86/vector-narrow-binop.ll
    M llvm/test/CodeGen/X86/vector-reduce-fadd-fast.ll
    M llvm/test/CodeGen/X86/vector-reduce-fadd.ll
    M llvm/test/CodeGen/X86/vector-reduce-fmax-fmin-fast.ll
    M llvm/test/CodeGen/X86/vector-reduce-fmax-nnan.ll
    M llvm/test/CodeGen/X86/vector-reduce-fmax.ll
    M llvm/test/CodeGen/X86/vector-reduce-fmin-nnan.ll
    M llvm/test/CodeGen/X86/vector-reduce-fmin.ll
    M llvm/test/CodeGen/X86/vector-reduce-fmul-fast.ll
    M llvm/test/CodeGen/X86/vector-reduce-fmul.ll
    M llvm/test/CodeGen/X86/vector-shuffle-128-v2.ll
    M llvm/test/CodeGen/X86/vector-shuffle-256-v4.ll
    M llvm/test/CodeGen/X86/vector-shuffle-256-v8.ll
    M llvm/test/CodeGen/X86/vector-shuffle-512-v16.ll
    M llvm/test/CodeGen/X86/vector-shuffle-512-v8.ll
    M llvm/test/CodeGen/X86/vector-shuffle-combining-avx.ll
    M llvm/test/CodeGen/X86/vector-shuffle-combining-xop.ll
    M llvm/test/CodeGen/X86/vector-shuffle-combining.ll
    M llvm/test/CodeGen/X86/x86-interleaved-access.ll

  Log Message:
  -----------
  [X86] X86FixupInstTunings - add VPERMILPDri -> VSHUFPDrri mapping

Similar to the original VPERMILPSri -> VSHUFPSrri mapping added in D143787, replacing VPERMILPDri -> VSHUFPDrri should never be any slower and saves an encoding byte.

The sibling VPERMILPDmi -> VPSHUFDmi mapping is trickier as we need the same shuffle mask in every lane (and it needs to be adjusted) - I haven't attempted that yet but we can investigate it in the future if there's interest.

Fixes #61060

Differential Revision: https://reviews.llvm.org/D148999