[all-commits] [llvm/llvm-project] 1c89ae: [libc][math] Improve sinhf and coshf performance.
lntue via All-commits
all-commits at lists.llvm.org
Thu Sep 15 06:21:08 PDT 2022
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 1c89ae71ea69a9203b35a9d96328f1c7ca54994c
https://github.com/llvm/llvm-project/commit/1c89ae71ea69a9203b35a9d96328f1c7ca54994c
Author: Tue Ly <lntue at google.com>
Date: 2022-09-15 (Thu, 15 Sep 2022)
Changed paths:
M libc/docs/math.rst
M libc/src/math/generic/CMakeLists.txt
M libc/src/math/generic/coshf.cpp
M libc/src/math/generic/exp2f.cpp
M libc/src/math/generic/explogxf.h
M libc/src/math/generic/sinhf.cpp
Log Message:
-----------
[libc][math] Improve sinhf and coshf performance.
Optimize `sinhf` and `coshf` by computing exp(x) and exp(-x) simultaneously.
Currently `sinhf` and `coshf` are implemented using the following formulas:
```
sinh(x) = 0.5 *(exp(x) - 1) - 0.5*(exp(-x) - 1)
cosh(x) = 0.5*exp(x) + 0.5*exp(-x)
```
where `exp(x)` and `exp(-x)` are calculated separately using the formula:
```
exp(x) ~ 2^hi * 2^mid * exp(dx)
~ 2^hi * 2^mid * P(dx)
```
By expanding the polynomial `P(dx)` into even and odd parts
```
P(dx) = P_even(dx) + dx * P_odd(dx)
```
we can see that the computations of `exp(x)` and `exp(-x)` have many things in common,
namely:
```
exp(x) ~ 2^(hi + mid) * (P_even(dx) + dx * P_odd(dx))
exp(-x) ~ 2^(-(hi + mid)) * (P_even(dx) - dx * P_odd(dx))
```
Expanding `sinh(x)` and `cosh(x)` with respect to the above formulas, we can compute
these two functions as follow in order to maximize the sharing parts:
```
sinh(x) = (e^x - e^(-x)) / 2
~ 0.5 * (P_even * (2^(hi + mid) - 2^(-(hi + mid))) +
dx * P_odd * (2^(hi + mid) + 2^(-(hi + mid))))
cosh(x) = (e^x + e^(-x)) / 2
~ 0.5 * (P_even * (2^(hi + mid) + 2^(-(hi + mid))) +
dx * P_odd * (2^(hi + mid) - 2^(-(hi + mid))))
```
So in this patch, we perform the following optimizations for `sinhf` and `coshf`:
# Use the above formulas to maximize sharing intermediate results,
# Apply similar optimizations from https://reviews.llvm.org/D133870
Performance benchmark using `perf` tool from the CORE-MATH project on Ryzen 1700:
For `sinhf`:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh sinhf
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput : 16.718
System LIBC reciprocal throughput : 63.151
BEFORE:
LIBC reciprocal throughput : 90.116
LIBC reciprocal throughput : 28.554 (with `-msse4.2` flag)
LIBC reciprocal throughput : 22.577 (with `-mfma` flag)
AFTER:
LIBC reciprocal throughput : 36.482
LIBC reciprocal throughput : 16.955 (with `-msse4.2` flag)
LIBC reciprocal throughput : 13.943 (with `-mfma` flag)
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh sinhf --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency : 48.821
System LIBC latency : 137.019
BEFORE
LIBC latency : 97.122
LIBC latency : 84.214 (with `-msse4.2` flag)
LIBC latency : 71.611 (with `-mfma` flag)
AFTER
LIBC latency : 54.555
LIBC latency : 50.865 (with `-msse4.2` flag)
LIBC latency : 48.700 (with `-mfma` flag)
```
For `coshf`:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh coshf
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput : 16.939
System LIBC reciprocal throughput : 19.695
BEFORE:
LIBC reciprocal throughput : 52.845
LIBC reciprocal throughput : 29.174 (with `-msse4.2` flag)
LIBC reciprocal throughput : 22.553 (with `-mfma` flag)
AFTER:
LIBC reciprocal throughput : 37.169
LIBC reciprocal throughput : 17.805 (with `-msse4.2` flag)
LIBC reciprocal throughput : 14.691 (with `-mfma` flag)
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh coshf --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency : 48.478
System LIBC latency : 48.044
BEFORE
LIBC latency : 99.123
LIBC latency : 85.595 (with `-msse4.2` flag)
LIBC latency : 72.776 (with `-mfma` flag)
AFTER
LIBC latency : 57.760
LIBC latency : 53.967 (with `-msse4.2` flag)
LIBC latency : 50.987 (with `-mfma` flag)
```
Reviewed By: orex, zimmermann6
Differential Revision: https://reviews.llvm.org/D133913
More information about the All-commits
mailing list