[all-commits] [llvm/llvm-project] 8e7f60: [X86] Atom SSE shift-by-variable take 2uops/3uops ...

Sat Oct 2 04:42:30 PDT 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 8e7f6039fa059268f9355bbcd093b284ee656f82
      https://github.com/llvm/llvm-project/commit/8e7f6039fa059268f9355bbcd093b284ee656f82
  Author: Simon Pilgrim <llvm-dev at redking.me.uk>
  Date:   2021-10-02 (Sat, 02 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86ScheduleAtom.td
    M llvm/test/tools/llvm-mca/X86/Atom/resources-mmx.s
    M llvm/test/tools/llvm-mca/X86/Atom/resources-sse2.s

  Log Message:
  -----------
  [X86] Atom SSE shift-by-variable take 2uops/3uops not 1uop

Based off the most recent llvm-exegesis captures (PR36895) and what Intel AoM / Agner / InstLatX64 reports as well.

  Commit: bb42cc2090a2316d529b92e1126241c53b85021a
      https://github.com/llvm/llvm-project/commit/bb42cc2090a2316d529b92e1126241c53b85021a
  Author: Simon Pilgrim <llvm-dev at redking.me.uk>
  Date:   2021-10-02 (Sat, 02 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/test/CodeGen/X86/vector-mul.ll

  Log Message:
  -----------
  [X86] decomposeMulByConstant - decompose legal vXi32 multiplies on SlowPMULLD targets and all vXi64 multiplies

X86's decomposeMulByConstant never permits mul decomposition to shift+add/sub if the vector multiply is legal.

Unfortunately this isn't great for SSE41+ targets which have PMULLD for vXi32 multiplies, but is often quite slow. This patch proposes to allow decomposition if the target has the SlowPMULLD flag (i.e. Silvermont). We also always decompose legal vXi64 multiplies - even latest IceLake has really poor latencies for PMULLQ.

Differential Revision: https://reviews.llvm.org/D110588

Compare: https://github.com/llvm/llvm-project/compare/acb459574afc...bb42cc2090a2