[all-commits] [llvm/llvm-project] 8e7f60: [X86] Atom SSE shift-by-variable take 2uops/3uops ...
Simon Pilgrim via All-commits
all-commits at lists.llvm.org
Sat Oct 2 04:42:30 PDT 2021
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 8e7f6039fa059268f9355bbcd093b284ee656f82
https://github.com/llvm/llvm-project/commit/8e7f6039fa059268f9355bbcd093b284ee656f82
Author: Simon Pilgrim <llvm-dev at redking.me.uk>
Date: 2021-10-02 (Sat, 02 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86ScheduleAtom.td
M llvm/test/tools/llvm-mca/X86/Atom/resources-mmx.s
M llvm/test/tools/llvm-mca/X86/Atom/resources-sse2.s
Log Message:
-----------
[X86] Atom SSE shift-by-variable take 2uops/3uops not 1uop
Based off the most recent llvm-exegesis captures (PR36895) and what Intel AoM / Agner / InstLatX64 reports as well.
Commit: bb42cc2090a2316d529b92e1126241c53b85021a
https://github.com/llvm/llvm-project/commit/bb42cc2090a2316d529b92e1126241c53b85021a
Author: Simon Pilgrim <llvm-dev at redking.me.uk>
Date: 2021-10-02 (Sat, 02 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86ISelLowering.cpp
M llvm/test/CodeGen/X86/vector-mul.ll
Log Message:
-----------
[X86] decomposeMulByConstant - decompose legal vXi32 multiplies on SlowPMULLD targets and all vXi64 multiplies
X86's decomposeMulByConstant never permits mul decomposition to shift+add/sub if the vector multiply is legal.
Unfortunately this isn't great for SSE41+ targets which have PMULLD for vXi32 multiplies, but is often quite slow. This patch proposes to allow decomposition if the target has the SlowPMULLD flag (i.e. Silvermont). We also always decompose legal vXi64 multiplies - even latest IceLake has really poor latencies for PMULLQ.
Differential Revision: https://reviews.llvm.org/D110588
Compare: https://github.com/llvm/llvm-project/compare/acb459574afc...bb42cc2090a2
More information about the All-commits
mailing list