[all-commits] [llvm/llvm-project] fff3e1: [x86] enable fast sqrtss/sqrtps tuning for AMD Zen...
Sanjay Patel via All-commits
all-commits at lists.llvm.org
Fri Feb 4 10:59:38 PST 2022
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: fff3e1dbaa9ee2d91dc15b39defa88346f03a4c2
https://github.com/llvm/llvm-project/commit/fff3e1dbaa9ee2d91dc15b39defa88346f03a4c2
Author: Sanjay Patel <spatel at rotateright.com>
Date: 2022-02-04 (Fri, 04 Feb 2022)
Changed paths:
M llvm/lib/Target/X86/X86.td
M llvm/test/CodeGen/X86/sqrt-fastmath-tune.ll
Log Message:
-----------
[x86] enable fast sqrtss/sqrtps tuning for AMD Zen cores
As discussed in D118534, all of the recent AMD CPUs have
relatively fast (<14 cycle latency) "sqrtss" and "sqrtps"
instructions:
https://uops.info/table.html?search=sqrtps&cb_lat=on&cb_tp=on&cb_SNB=on&cb_SKL=on&cb_ZENp=on&cb_ZEN2=on&cb_ZEN3=on&cb_measurements=on&cb_avx=on&cb_sse=on
So we should set this tuning flag to alter codegen of plain
"sqrt(X)" expansion (as opposed to reciprocal-sqrt - there
is other test coverage for that pattern). The expansion is
both slower and less accurate than the hardware instruction.
Differential Revision: https://reviews.llvm.org/D119001
More information about the All-commits
mailing list