[PATCH] D119001: [x86] enable fast sqrtss tuning for AMD Zen cores
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Feb 4 09:05:36 PST 2022
spatel marked an inline comment as done.
spatel added inline comments.
================
Comment at: llvm/lib/Target/X86/X86.td:1172
TuningBranchFusion,
+ TuningFastScalarFSQRT,
TuningFastScalarShiftMasks,
----------------
lebedev.ri wrote:
> Please also add `TuningFastVectorFSQRT`, i don't see any difference between scalar and vector variants at least on znver3.
Good point. On Zen1, it looks like 256-bit would be split, but that's still recip throughput of 8 cycles, so it's better than the 10+ instructions/cycles in the estimate sequence.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D119001/new/
https://reviews.llvm.org/D119001
More information about the llvm-commits
mailing list