[all-commits] [llvm/llvm-project] 5dbd51: [libc][math] Improve tanhf performance.
lntue via All-commits
all-commits at lists.llvm.org
Tue Jun 20 06:25:26 PDT 2023
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 5dbd5118ec5435269602de30ebd4982dd8b273dc
https://github.com/llvm/llvm-project/commit/5dbd5118ec5435269602de30ebd4982dd8b273dc
Author: Tue Ly <lntue at google.com>
Date: 2023-06-20 (Tue, 20 Jun 2023)
Changed paths:
M libc/src/__support/macros/properties/CMakeLists.txt
M libc/src/__support/macros/properties/cpu_features.h
M libc/src/math/generic/CMakeLists.txt
M libc/src/math/generic/tanhf.cpp
M libc/test/src/math/tanhf_test.cpp
M utils/bazel/llvm-project-overlay/libc/BUILD.bazel
Log Message:
-----------
[libc][math] Improve tanhf performance.
Re-order exceptional branches and slightly adjust the evaluation.
Performance tested with the CORE-MATH project on AMD EPYC 7B12 (clocks/op)
Reciprocal throughputs:
```
--- BEFORE ---
$ CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf
[####################] 100 % (with -mavx2 -mfma)
Ntrial = 20 ; Min = 7.794 + 0.102 clc/call; Median-Min = 0.066 clc/call; Max = 8.267 clc/call;
[####################] 100 %. (with -msse4.2)
Ntrial = 20 ; Min = 10.783 + 0.172 clc/call; Median-Min = 0.144 clc/call; Max = 11.446 clc/call;
[####################] 100 %. (SSE2)
Ntrial = 20 ; Min = 18.926 + 0.381 clc/call; Median-Min = 0.342 clc/call; Max = 19.623 clc/call;
--- AFTER ---
$ CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf
[####################] 100 % (with -mavx2 -mfma)
Ntrial = 20 ; Min = 6.598 + 0.085 clc/call; Median-Min = 0.052 clc/call; Max = 6.868 clc/call;
[####################] 100 % (with -msse4.2)
Ntrial = 20 ; Min = 9.245 + 0.304 clc/call; Median-Min = 0.248 clc/call; Max = 10.675 clc/call;
[####################] 100 %. (SSE2)
Ntrial = 20 ; Min = 11.724 + 0.440 clc/call; Median-Min = 0.444 clc/call; Max = 12.262 clc/call;
```
Latency:
```
--- BEFORE ---
$ PERF_ARGS="--latency" CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf
[####################] 100 % (with -mavx2 -mfma)
Ntrial = 20 ; Min = 38.821 + 0.157 clc/call; Median-Min = 0.122 clc/call; Max = 39.539 clc/call;
[####################] 100 %. (with -msse4.2)
Ntrial = 20 ; Min = 44.767 + 0.766 clc/call; Median-Min = 0.681 clc/call; Max = 45.951 clc/call;
[####################] 100 %. (SSE2)
Ntrial = 20 ; Min = 55.055 + 1.512 clc/call; Median-Min = 1.571 clc/call; Max = 57.039 clc/call;
--- AFTER ---
$ PERF_ARGS="--latency" CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf
[####################] 100 % (with -mavx2 -mfma)
Ntrial = 20 ; Min = 36.147 + 0.194 clc/call; Median-Min = 0.181 clc/call; Max = 36.536 clc/call;
[####################] 100 % (with -msse4.2)
Ntrial = 20 ; Min = 40.904 + 0.728 clc/call; Median-Min = 0.557 clc/call; Max = 42.231 clc/call;
[####################] 100 %. (SSE2)
Ntrial = 20 ; Min = 55.776 + 0.557 clc/call; Median-Min = 0.542 clc/call; Max = 56.551 clc/call;
```
Reviewed By: michaelrj
Differential Revision: https://reviews.llvm.org/D153026
More information about the All-commits
mailing list