[all-commits] [llvm/llvm-project] cecaf2: Adding tuning flags for int <-> fp domain switchin...

goldsteinn via All-commits all-commits at lists.llvm.org
Mon Feb 27 16:54:00 PST 2023


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: cecaf295898f6bb23b052892c1d06c27f2715b0d
      https://github.com/llvm/llvm-project/commit/cecaf295898f6bb23b052892c1d06c27f2715b0d
  Author: Noah Goldstein <goldstein.w.n at gmail.com>
  Date:   2023-02-27 (Mon, 27 Feb 2023)

  Changed paths:
    M llvm/lib/Target/X86/X86.td
    M llvm/lib/Target/X86/X86Subtarget.h
    M llvm/lib/Target/X86/X86TargetTransformInfo.h

  Log Message:
  -----------
  Adding tuning flags for int <-> fp domain switching penalties; NFC

Atom
    - No domain switching penalties
Nehalem+
    - No penalty on moves
Haswell+
    - No penalty on moves / shuffles
Skylake+
    - No penality on moves / shuffles / blends

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D143859


  Commit: e56ddae849317a7f41f97a8a9c41cf63ce40e4f8
      https://github.com/llvm/llvm-project/commit/e56ddae849317a7f41f97a8a9c41cf63ce40e4f8
  Author: Noah Goldstein <goldstein.w.n at gmail.com>
  Date:   2023-02-27 (Mon, 27 Feb 2023)

  Changed paths:
    A llvm/test/CodeGen/X86/tuning-shuffle-permilps-avx512.ll
    A llvm/test/CodeGen/X86/tuning-shuffle-permilps.ll

  Log Message:
  -----------
  Add tests for replacing `{v}permilps` -> `{v}shufps/{v}pshufd`; NFC

Differential Revision: https://reviews.llvm.org/D144779


  Commit: 69a322fed19b977d15be9500d8653496b73673e9
      https://github.com/llvm/llvm-project/commit/69a322fed19b977d15be9500d8653496b73673e9
  Author: Noah Goldstein <goldstein.w.n at gmail.com>
  Date:   2023-02-27 (Mon, 27 Feb 2023)

  Changed paths:
    M llvm/lib/Target/X86/CMakeLists.txt
    M llvm/lib/Target/X86/X86.h
    A llvm/lib/Target/X86/X86FixupInstTuning.cpp
    M llvm/lib/Target/X86/X86TargetMachine.cpp
    M llvm/test/CodeGen/X86/2012-01-12-extract-sv.ll
    M llvm/test/CodeGen/X86/SwizzleShuff.ll
    M llvm/test/CodeGen/X86/any_extend_vector_inreg_of_broadcast.ll
    M llvm/test/CodeGen/X86/any_extend_vector_inreg_of_broadcast_from_memory.ll
    M llvm/test/CodeGen/X86/avx-intrinsics-fast-isel.ll
    M llvm/test/CodeGen/X86/avx-intrinsics-x86-upgrade.ll
    M llvm/test/CodeGen/X86/avx-splat.ll
    M llvm/test/CodeGen/X86/avx-vbroadcast.ll
    M llvm/test/CodeGen/X86/avx-vinsertf128.ll
    M llvm/test/CodeGen/X86/avx-vperm2x128.ll
    M llvm/test/CodeGen/X86/avx2-intrinsics-fast-isel.ll
    M llvm/test/CodeGen/X86/avx512-cvt.ll
    M llvm/test/CodeGen/X86/avx512-intrinsics-fast-isel.ll
    M llvm/test/CodeGen/X86/avx512-intrinsics-upgrade.ll
    M llvm/test/CodeGen/X86/avx512-shuffles/in_lane_permute.ll
    M llvm/test/CodeGen/X86/avx512-shuffles/shuffle.ll
    M llvm/test/CodeGen/X86/avx512-trunc.ll
    M llvm/test/CodeGen/X86/avx512-vec-cmp.ll
    M llvm/test/CodeGen/X86/avx512fp16-mov.ll
    M llvm/test/CodeGen/X86/avx512fp16-mscatter.ll
    M llvm/test/CodeGen/X86/avx512vl-intrinsics-upgrade.ll
    M llvm/test/CodeGen/X86/bitcast-int-to-vector-bool-sext.ll
    M llvm/test/CodeGen/X86/bitcast-int-to-vector-bool-zext.ll
    M llvm/test/CodeGen/X86/bitcast-int-to-vector-bool.ll
    M llvm/test/CodeGen/X86/buildvec-extract.ll
    M llvm/test/CodeGen/X86/combine-and.ll
    M llvm/test/CodeGen/X86/combine-concatvectors.ll
    M llvm/test/CodeGen/X86/copy-low-subvec-elt-to-high-subvec-elt.ll
    M llvm/test/CodeGen/X86/extract-concat.ll
    M llvm/test/CodeGen/X86/extract-store.ll
    M llvm/test/CodeGen/X86/fdiv-combine-vec.ll
    M llvm/test/CodeGen/X86/fmaddsub-combine.ll
    M llvm/test/CodeGen/X86/haddsub-2.ll
    M llvm/test/CodeGen/X86/haddsub-4.ll
    M llvm/test/CodeGen/X86/haddsub-undef.ll
    M llvm/test/CodeGen/X86/haddsub.ll
    M llvm/test/CodeGen/X86/horizontal-reduce-smax.ll
    M llvm/test/CodeGen/X86/horizontal-reduce-smin.ll
    M llvm/test/CodeGen/X86/horizontal-reduce-umax.ll
    M llvm/test/CodeGen/X86/horizontal-reduce-umin.ll
    M llvm/test/CodeGen/X86/horizontal-shuffle-2.ll
    M llvm/test/CodeGen/X86/horizontal-shuffle-3.ll
    M llvm/test/CodeGen/X86/horizontal-shuffle-4.ll
    M llvm/test/CodeGen/X86/horizontal-sum.ll
    M llvm/test/CodeGen/X86/i64-to-float.ll
    M llvm/test/CodeGen/X86/insertelement-var-index.ll
    M llvm/test/CodeGen/X86/known-bits-vector.ll
    M llvm/test/CodeGen/X86/known-signbits-vector.ll
    M llvm/test/CodeGen/X86/masked_store.ll
    M llvm/test/CodeGen/X86/masked_store_trunc.ll
    M llvm/test/CodeGen/X86/masked_store_trunc_ssat.ll
    M llvm/test/CodeGen/X86/masked_store_trunc_usat.ll
    M llvm/test/CodeGen/X86/matrix-multiply.ll
    M llvm/test/CodeGen/X86/oddshuffles.ll
    M llvm/test/CodeGen/X86/opt-pipeline.ll
    M llvm/test/CodeGen/X86/packss.ll
    M llvm/test/CodeGen/X86/palignr.ll
    M llvm/test/CodeGen/X86/pr31956.ll
    M llvm/test/CodeGen/X86/pr40730.ll
    M llvm/test/CodeGen/X86/pr40811.ll
    M llvm/test/CodeGen/X86/pr50609.ll
    M llvm/test/CodeGen/X86/rotate_vec.ll
    M llvm/test/CodeGen/X86/scalarize-fp.ll
    M llvm/test/CodeGen/X86/shuffle-of-shift.ll
    M llvm/test/CodeGen/X86/shuffle-of-splat-multiuses.ll
    M llvm/test/CodeGen/X86/sse-fsignum.ll
    M llvm/test/CodeGen/X86/sse-intrinsics-fast-isel.ll
    M llvm/test/CodeGen/X86/sse2-intrinsics-fast-isel.ll
    M llvm/test/CodeGen/X86/sse2-intrinsics-x86-upgrade.ll
    M llvm/test/CodeGen/X86/sse2.ll
    M llvm/test/CodeGen/X86/sse3-avx-addsub-2.ll
    M llvm/test/CodeGen/X86/sse41.ll
    M llvm/test/CodeGen/X86/swizzle-avx2.ll
    M llvm/test/CodeGen/X86/tuning-shuffle-permilps-avx512.ll
    M llvm/test/CodeGen/X86/tuning-shuffle-permilps.ll
    M llvm/test/CodeGen/X86/vec-strict-fptoint-256.ll
    M llvm/test/CodeGen/X86/vec-strict-fptoint-512.ll
    M llvm/test/CodeGen/X86/vec-strict-inttofp-128.ll
    M llvm/test/CodeGen/X86/vec-strict-inttofp-256.ll
    M llvm/test/CodeGen/X86/vec-strict-inttofp-512.ll
    M llvm/test/CodeGen/X86/vec_fp_to_int.ll
    M llvm/test/CodeGen/X86/vec_int_to_fp.ll
    M llvm/test/CodeGen/X86/vec_umulo.ll
    M llvm/test/CodeGen/X86/vector-fshr-256.ll
    M llvm/test/CodeGen/X86/vector-half-conversions.ll
    M llvm/test/CodeGen/X86/vector-interleave.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-5.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-7.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-8.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-2.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-3.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-5.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-6.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-7.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-8.ll
    M llvm/test/CodeGen/X86/vector-interleaved-load-i64-stride-3.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-7.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i16-stride-8.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-2.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-3.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-4.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-5.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-6.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-7.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-8.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i64-stride-3.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i64-stride-5.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i64-stride-7.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-6.ll
    M llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-8.ll
    M llvm/test/CodeGen/X86/vector-reduce-add-mask.ll
    M llvm/test/CodeGen/X86/vector-reduce-and-cmp.ll
    M llvm/test/CodeGen/X86/vector-reduce-and.ll
    M llvm/test/CodeGen/X86/vector-reduce-fadd.ll
    M llvm/test/CodeGen/X86/vector-reduce-fmax.ll
    M llvm/test/CodeGen/X86/vector-reduce-fmin.ll
    M llvm/test/CodeGen/X86/vector-reduce-fmul.ll
    M llvm/test/CodeGen/X86/vector-reduce-or.ll
    M llvm/test/CodeGen/X86/vector-reduce-smax.ll
    M llvm/test/CodeGen/X86/vector-reduce-smin.ll
    M llvm/test/CodeGen/X86/vector-reduce-umax.ll
    M llvm/test/CodeGen/X86/vector-reduce-umin.ll
    M llvm/test/CodeGen/X86/vector-reduce-xor.ll
    M llvm/test/CodeGen/X86/vector-sext.ll
    M llvm/test/CodeGen/X86/vector-shift-lshr-128.ll
    M llvm/test/CodeGen/X86/vector-shift-lshr-256.ll
    M llvm/test/CodeGen/X86/vector-shift-shl-256.ll
    M llvm/test/CodeGen/X86/vector-shuffle-128-v2.ll
    M llvm/test/CodeGen/X86/vector-shuffle-128-v4.ll
    M llvm/test/CodeGen/X86/vector-shuffle-128-v8.ll
    M llvm/test/CodeGen/X86/vector-shuffle-256-v16.ll
    M llvm/test/CodeGen/X86/vector-shuffle-256-v32.ll
    M llvm/test/CodeGen/X86/vector-shuffle-256-v4.ll
    M llvm/test/CodeGen/X86/vector-shuffle-256-v8.ll
    M llvm/test/CodeGen/X86/vector-shuffle-512-v16.ll
    M llvm/test/CodeGen/X86/vector-shuffle-512-v8.ll
    M llvm/test/CodeGen/X86/vector-shuffle-avx512.ll
    M llvm/test/CodeGen/X86/vector-shuffle-combining-avx.ll
    M llvm/test/CodeGen/X86/vector-shuffle-combining-avx2.ll
    M llvm/test/CodeGen/X86/vector-shuffle-combining-avx512f.ll
    M llvm/test/CodeGen/X86/vector-shuffle-combining-ssse3.ll
    M llvm/test/CodeGen/X86/vector-shuffle-combining.ll
    M llvm/test/CodeGen/X86/vector-shuffle-concatenation.ll
    M llvm/test/CodeGen/X86/vector-trunc-ssat.ll
    M llvm/test/CodeGen/X86/vector-trunc-usat.ll
    M llvm/test/CodeGen/X86/vselect-avx.ll
    M llvm/test/CodeGen/X86/x86-interleaved-access.ll
    M llvm/test/CodeGen/X86/zero_extend_vector_inreg_of_broadcast.ll
    M llvm/test/CodeGen/X86/zero_extend_vector_inreg_of_broadcast_from_memory.ll
    M llvm/utils/gn/secondary/llvm/lib/Target/X86/BUILD.gn

  Log Message:
  -----------
  Add new pass `X86FixupInstTuning` for fixing up machine-instruction selection.

There are a variety of cases where we want more control over the exact
instruction emitted. This commit creates a new pass to fixup
instructions after the DAG has been lowered. The pass is only meant to
replace instructions that are guranteed to be interchangable, not to
do analysis for special cases.

Handling these instruction changes in in X86ISelLowering of
X86ISelDAGToDAG isn't ideal, as its liable to either break existing
patterns that expected a certain instruction or generate infinite
loops.

As well, operating as the MachineInstruction level allows us to access
scheduling/code size information for making the decisions.

Currently only implements `{v}permilps` -> `{v}shufps/{v}shufd` but
more transforms can be added.

Differential Revision: https://reviews.llvm.org/D143787


  Commit: 6957a8cc6c92db5c1b6e9307b29bb78eb1beb718
      https://github.com/llvm/llvm-project/commit/6957a8cc6c92db5c1b6e9307b29bb78eb1beb718
  Author: Noah Goldstein <goldstein.w.n at gmail.com>
  Date:   2023-02-27 (Mon, 27 Feb 2023)

  Changed paths:
    A llvm/test/CodeGen/X86/tuning-shuffle-unpckpd-avx512.ll
    A llvm/test/CodeGen/X86/tuning-shuffle-unpckpd.ll

  Log Message:
  -----------
  Add tests for replacing `{v}unpck{l|h}pd` -> `{v}shufps`; NFC

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D144442


Compare: https://github.com/llvm/llvm-project/compare/7198c87f42f6...6957a8cc6c92


More information about the All-commits mailing list