[PATCH] D119001: [x86] enable fast sqrtss tuning for AMD Zen cores

Sanjay Patel via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Feb 4 09:05:36 PST 2022


spatel marked an inline comment as done.
spatel added inline comments.


================
Comment at: llvm/lib/Target/X86/X86.td:1172
                                      TuningBranchFusion,
+                                     TuningFastScalarFSQRT,
                                      TuningFastScalarShiftMasks,
----------------
lebedev.ri wrote:
> Please also add `TuningFastVectorFSQRT`, i don't see any difference between scalar and vector variants at least on znver3.
Good point. On Zen1, it looks like 256-bit would be split, but that's still recip throughput of 8 cycles, so it's better than the 10+ instructions/cycles in the estimate sequence.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119001/new/

https://reviews.llvm.org/D119001



More information about the llvm-commits mailing list