[all-commits] [llvm/llvm-project] e6226e: [libc][math] Improve exp2f performance.
lntue via All-commits
all-commits at lists.llvm.org
Wed Sep 14 11:45:09 PDT 2022
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: e6226e6b7234e5fce5a928861541cec3a519f379
https://github.com/llvm/llvm-project/commit/e6226e6b7234e5fce5a928861541cec3a519f379
Author: Tue Ly <lntue at google.com>
Date: 2022-09-14 (Wed, 14 Sep 2022)
Changed paths:
M libc/docs/math.rst
M libc/src/math/generic/CMakeLists.txt
M libc/src/math/generic/exp2f.cpp
M libc/src/math/generic/explogxf.h
M libc/test/src/math/exhaustive/exp2f_test.cpp
M libc/test/src/math/explogxf_test.cpp
Log Message:
-----------
[libc][math] Improve exp2f performance.
Reduce the number of subintervals that need lookup table and optimize
the evaluation steps.
Currently, `exp2f` is computed by reducing to `2^hi * 2^mid * 2^lo` where
`-16/32 <= mid <= 15/32` and `-1/64 <= lo <= 1/64`, and `2^lo` is then
approximated by a degree 6 polynomial.
Experiment with Sollya showed that by using a degree 6 polynomial, we
can approximate `2^lo` for a bigger range with reasonable errors:
```
> P = fpminimax((2^x - 1)/x, 5, [|D...|], [-1/64, 1/64]);
> dirtyinfnorm(2^x - 1 - x*P, [-1/64, 1/64]);
0x1.e18a1bc09114def49eb851655e2e5c4dd08075ac2p-63
> P = fpminimax((2^x - 1)/x, 5, [|D...|], [-1/32, 1/32]);
> dirtyinfnorm(2^x - 1 - x*P, [-1/32, 1/32]);
0x1.05627b6ed48ca417fe53e3495f7df4baf84a05e2ap-56
```
So we can optimize the implementation a bit with:
# Reduce the range to `mid = i/16` for `i = 0..15` and `-1/32 <= lo <= 1/32`
# Store the table `2^mid` in bits, and add `hi` directly to its exponent field to compute `2^hi * 2^mid`
# Rearrange the order of evaluating the polynomial approximating `2^lo`.
Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh exp2f
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput : 9.534
System LIBC reciprocal throughput : 6.229
BEFORE:
LIBC reciprocal throughput : 21.405
LIBC reciprocal throughput : 15.241 (with `-msse4.2` flag)
LIBC reciprocal throughput : 11.111 (with `-mfma` flag)
AFTER:
LIBC reciprocal throughput : 18.617
LIBC reciprocal throughput : 12.852 (with `-msse4.2` flag)
LIBC reciprocal throughput : 9.253 (with `-mfma` flag)
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh exp2f --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency : 40.869
System LIBC latency : 30.580
BEFORE
LIBC latency : 64.888
LIBC latency : 61.027 (with `-msse4.2` flag)
LIBC latency : 48.778 (with `-mfma` flag)
AFTER
LIBC latency : 48.803
LIBC latency : 45.047 (with `-msse4.2` flag)
LIBC latency : 37.487 (with `-mfma` flag)
```
Reviewed By: sivachandra, orex
Differential Revision: https://reviews.llvm.org/D133870
More information about the All-commits
mailing list