[PATCH] D118534: [X86] Introduce more common modern tunings into `generic`
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 3 11:32:30 PST 2022
spatel added inline comments.
================
Comment at: llvm/lib/Target/X86/X86.td:1222
TuningMacroFusion,
+ TuningFastScalarFSQRT,
+ TuningFast15ByteNOP,
----------------
Fast scalar SQRT controls whether we produce a single-precision sqrtss instruction or a reciprocal estimate sequence of about 9 instructions (when allowed with fast-math).
Based on Agner's timing docs and the flag description, this should be set for Zen1 (sqrtss has latency 9-10), but it's not as obviously good for Zen2/3 because those have sqrtss latency of 14. The flag is set for Intel CPUs since SandyBridge, so that's sqrtss latency between 10-14.
I think this is ok to set, but if the assumption is that we're tuning for any mainstream CPU of the last N years, then shouldn't we add this flag to the later AMD models too for less surprising output? There's a possible side benefit that we will produce more accurate results too.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D118534/new/
https://reviews.llvm.org/D118534
More information about the llvm-commits
mailing list