[all-commits] [llvm/llvm-project] 5dbd51: [libc][math] Improve tanhf performance.

lntue via All-commits all-commits at lists.llvm.org
Tue Jun 20 06:25:26 PDT 2023


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 5dbd5118ec5435269602de30ebd4982dd8b273dc
      https://github.com/llvm/llvm-project/commit/5dbd5118ec5435269602de30ebd4982dd8b273dc
  Author: Tue Ly <lntue at google.com>
  Date:   2023-06-20 (Tue, 20 Jun 2023)

  Changed paths:
    M libc/src/__support/macros/properties/CMakeLists.txt
    M libc/src/__support/macros/properties/cpu_features.h
    M libc/src/math/generic/CMakeLists.txt
    M libc/src/math/generic/tanhf.cpp
    M libc/test/src/math/tanhf_test.cpp
    M utils/bazel/llvm-project-overlay/libc/BUILD.bazel

  Log Message:
  -----------
  [libc][math] Improve tanhf performance.

Re-order exceptional branches and slightly adjust the evaluation.

Performance tested with the CORE-MATH project on AMD EPYC 7B12 (clocks/op)

Reciprocal throughputs:
```
--- BEFORE ---

$ CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf
[####################] 100 %  (with -mavx2 -mfma)
Ntrial = 20 ; Min = 7.794 + 0.102 clc/call; Median-Min = 0.066 clc/call; Max = 8.267 clc/call;
[####################] 100 %. (with -msse4.2)
Ntrial = 20 ; Min = 10.783 + 0.172 clc/call; Median-Min = 0.144 clc/call; Max = 11.446 clc/call;
[####################] 100 %. (SSE2)
Ntrial = 20 ; Min = 18.926 + 0.381 clc/call; Median-Min = 0.342 clc/call; Max = 19.623 clc/call;

--- AFTER ---

$ CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf
[####################] 100 %  (with -mavx2 -mfma)
Ntrial = 20 ; Min = 6.598 + 0.085 clc/call; Median-Min = 0.052 clc/call; Max = 6.868 clc/call;
[####################] 100 %  (with -msse4.2)
Ntrial = 20 ; Min = 9.245 + 0.304 clc/call; Median-Min = 0.248 clc/call; Max = 10.675 clc/call;
[####################] 100 %. (SSE2)
Ntrial = 20 ; Min = 11.724 + 0.440 clc/call; Median-Min = 0.444 clc/call; Max = 12.262 clc/call;
```

Latency:
```
--- BEFORE ---

$ PERF_ARGS="--latency" CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf
[####################] 100 %  (with -mavx2 -mfma)
Ntrial = 20 ; Min = 38.821 + 0.157 clc/call; Median-Min = 0.122 clc/call; Max = 39.539 clc/call;
[####################] 100 %. (with -msse4.2)
Ntrial = 20 ; Min = 44.767 + 0.766 clc/call; Median-Min = 0.681 clc/call; Max = 45.951 clc/call;
[####################] 100 %. (SSE2)
Ntrial = 20 ; Min = 55.055 + 1.512 clc/call; Median-Min = 1.571 clc/call; Max = 57.039 clc/call;

--- AFTER ---

$ PERF_ARGS="--latency" CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf
[####################] 100 %  (with -mavx2 -mfma)
Ntrial = 20 ; Min = 36.147 + 0.194 clc/call; Median-Min = 0.181 clc/call; Max = 36.536 clc/call;
[####################] 100 %  (with -msse4.2)
Ntrial = 20 ; Min = 40.904 + 0.728 clc/call; Median-Min = 0.557 clc/call; Max = 42.231 clc/call;
[####################] 100 %. (SSE2)
Ntrial = 20 ; Min = 55.776 + 0.557 clc/call; Median-Min = 0.542 clc/call; Max = 56.551 clc/call;
```

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D153026




More information about the All-commits mailing list