[all-commits] [llvm/llvm-project] fff3e1: [x86] enable fast sqrtss/sqrtps tuning for AMD Zen...

Sanjay Patel via All-commits all-commits at lists.llvm.org
Fri Feb 4 10:59:38 PST 2022


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: fff3e1dbaa9ee2d91dc15b39defa88346f03a4c2
      https://github.com/llvm/llvm-project/commit/fff3e1dbaa9ee2d91dc15b39defa88346f03a4c2
  Author: Sanjay Patel <spatel at rotateright.com>
  Date:   2022-02-04 (Fri, 04 Feb 2022)

  Changed paths:
    M llvm/lib/Target/X86/X86.td
    M llvm/test/CodeGen/X86/sqrt-fastmath-tune.ll

  Log Message:
  -----------
  [x86] enable fast sqrtss/sqrtps tuning for AMD Zen cores

As discussed in D118534, all of the recent AMD CPUs have
relatively fast (<14 cycle latency) "sqrtss" and "sqrtps"
instructions:
https://uops.info/table.html?search=sqrtps&cb_lat=on&cb_tp=on&cb_SNB=on&cb_SKL=on&cb_ZENp=on&cb_ZEN2=on&cb_ZEN3=on&cb_measurements=on&cb_avx=on&cb_sse=on

So we should set this tuning flag to alter codegen of plain
"sqrt(X)" expansion (as opposed to reciprocal-sqrt - there
is other test coverage for that pattern). The expansion is
both slower and less accurate than the hardware instruction.

Differential Revision: https://reviews.llvm.org/D119001




More information about the All-commits mailing list