[all-commits] [llvm/llvm-project] e2f652: [X86] Improve inst tuning tests for X86FixupInstTu...

goldsteinn via All-commits all-commits at lists.llvm.org
Sun Apr 9 22:17:40 PDT 2023


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: e2f65276908e6e3ca0129df08a64c973e27bcc46
      https://github.com/llvm/llvm-project/commit/e2f65276908e6e3ca0129df08a64c973e27bcc46
  Author: Noah Goldstein <goldstein.w.n at gmail.com>
  Date:   2023-04-10 (Mon, 10 Apr 2023)

  Changed paths:
    M llvm/test/CodeGen/X86/tuning-shuffle-permilps-avx512.ll
    M llvm/test/CodeGen/X86/tuning-shuffle-permilps.ll
    M llvm/test/CodeGen/X86/tuning-shuffle-unpckpd-avx512.ll
    M llvm/test/CodeGen/X86/tuning-shuffle-unpckpd.ll
    A llvm/test/CodeGen/X86/tuning-shuffle-unpckps-avx512.ll
    A llvm/test/CodeGen/X86/tuning-shuffle-unpckps.ll

  Log Message:
  -----------
  [X86] Improve inst tuning tests for X86FixupInstTuning Pass; NFC

1) Add tests for `unpckps`.
2) Add explicit test for fast shuffles (ICX+) but WITH bypass delay.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D147726


  Commit: 2ce1698a343c599910bceed399ca7020816b230e
      https://github.com/llvm/llvm-project/commit/2ce1698a343c599910bceed399ca7020816b230e
  Author: Noah Goldstein <goldstein.w.n at gmail.com>
  Date:   2023-04-10 (Mon, 10 Apr 2023)

  Changed paths:
    M llvm/lib/Target/X86/X86FixupInstTuning.cpp
    M llvm/test/CodeGen/X86/tuning-shuffle-permilps-avx512.ll
    M llvm/test/CodeGen/X86/tuning-shuffle-permilps.ll

  Log Message:
  -----------
  [X86] Fix perf bug in `permilps` -> `shufd` in X86FixupInstTuning.

We shouldn't do the transformation if we either have bypass delay OR
the new opcode has worse performance. Previous code was incorrectly
using AND.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D147727


  Commit: c3f01f13b10d708b9b7ff45a6ccc2f0c3462b3af
      https://github.com/llvm/llvm-project/commit/c3f01f13b10d708b9b7ff45a6ccc2f0c3462b3af
  Author: Noah Goldstein <goldstein.w.n at gmail.com>
  Date:   2023-04-10 (Mon, 10 Apr 2023)

  Changed paths:
    M llvm/lib/Target/X86/X86FixupInstTuning.cpp
    M llvm/test/CodeGen/X86/tuning-shuffle-unpckpd-avx512.ll
    M llvm/test/CodeGen/X86/tuning-shuffle-unpckpd.ll

  Log Message:
  -----------
  [X86] Add inst fixup for `unpckpd` -> `unpckqdq`.

`unpckqdq` seems to be treated as a shuffle from bypass delay
perspective (which makes sense it appears to have shared shuffle units
for all micro-arch).

`unpckqdq` is slightly preferable to `shufpd` as it saves 1-byte of
code size and can be used to replace the micro-fused `rm` version. So,
if the target has no bypass delay, we should do `unpckpd` ->
`unpckqdq` instead of `shufpd.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D147728


  Commit: d65720652dd644483530ecb547365a2239a97979
      https://github.com/llvm/llvm-project/commit/d65720652dd644483530ecb547365a2239a97979
  Author: Noah Goldstein <goldstein.w.n at gmail.com>
  Date:   2023-04-10 (Mon, 10 Apr 2023)

  Changed paths:
    M llvm/lib/Target/X86/X86FixupInstTuning.cpp
    M llvm/test/CodeGen/X86/tuning-shuffle-unpckps-avx512.ll
    M llvm/test/CodeGen/X86/tuning-shuffle-unpckps.ll

  Log Message:
  -----------
  [X86] Add inst fixup for `unpckps` -> `unpckdq`.

`unpckps` has the same performance as `unpckpd` (only port5) wereas
`unpckdq` can run on p15 on some newer architectures.

`unpckdq` is in the integer domain, so only do the transform if the
target has no bypass delay on shuffles (SKL+).

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D147729


Compare: https://github.com/llvm/llvm-project/compare/bc257ff07b4d...d65720652dd6


More information about the All-commits mailing list