[PATCH] D118534: [X86] Introduce more common modern tunings into `generic`

Sanjay Patel via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Feb 3 11:32:30 PST 2022


spatel added inline comments.


================
Comment at: llvm/lib/Target/X86/X86.td:1222
                  TuningMacroFusion,
+                 TuningFastScalarFSQRT,
+                 TuningFast15ByteNOP,
----------------
Fast scalar SQRT controls whether we produce a single-precision sqrtss instruction or a reciprocal estimate sequence of about 9 instructions (when allowed with fast-math).

Based on Agner's timing docs and the flag description, this should be set for Zen1 (sqrtss has latency 9-10), but it's not as obviously good for Zen2/3 because those have sqrtss latency of 14. The flag is set for Intel CPUs since SandyBridge, so that's sqrtss latency between 10-14.

I think this is ok to set, but if the assumption is that we're tuning for any mainstream CPU of the last N years, then shouldn't we add this flag to the later AMD models too for less surprising output? There's a possible side benefit that we will produce more accurate results too.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118534/new/

https://reviews.llvm.org/D118534



More information about the llvm-commits mailing list