[libc-commits] [libc] e6226e6 - [libc][math] Improve exp2f performance.
Tue Ly via libc-commits
libc-commits at lists.llvm.org
Wed Sep 14 11:45:02 PDT 2022
Author: Tue Ly
Date: 2022-09-14T14:44:25-04:00
New Revision: e6226e6b7234e5fce5a928861541cec3a519f379
URL: https://github.com/llvm/llvm-project/commit/e6226e6b7234e5fce5a928861541cec3a519f379
DIFF: https://github.com/llvm/llvm-project/commit/e6226e6b7234e5fce5a928861541cec3a519f379.diff
LOG: [libc][math] Improve exp2f performance.
Reduce the number of subintervals that need lookup table and optimize
the evaluation steps.
Currently, `exp2f` is computed by reducing to `2^hi * 2^mid * 2^lo` where
`-16/32 <= mid <= 15/32` and `-1/64 <= lo <= 1/64`, and `2^lo` is then
approximated by a degree 6 polynomial.
Experiment with Sollya showed that by using a degree 6 polynomial, we
can approximate `2^lo` for a bigger range with reasonable errors:
```
> P = fpminimax((2^x - 1)/x, 5, [|D...|], [-1/64, 1/64]);
> dirtyinfnorm(2^x - 1 - x*P, [-1/64, 1/64]);
0x1.e18a1bc09114def49eb851655e2e5c4dd08075ac2p-63
> P = fpminimax((2^x - 1)/x, 5, [|D...|], [-1/32, 1/32]);
> dirtyinfnorm(2^x - 1 - x*P, [-1/32, 1/32]);
0x1.05627b6ed48ca417fe53e3495f7df4baf84a05e2ap-56
```
So we can optimize the implementation a bit with:
# Reduce the range to `mid = i/16` for `i = 0..15` and `-1/32 <= lo <= 1/32`
# Store the table `2^mid` in bits, and add `hi` directly to its exponent field to compute `2^hi * 2^mid`
# Rearrange the order of evaluating the polynomial approximating `2^lo`.
Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh exp2f
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput : 9.534
System LIBC reciprocal throughput : 6.229
BEFORE:
LIBC reciprocal throughput : 21.405
LIBC reciprocal throughput : 15.241 (with `-msse4.2` flag)
LIBC reciprocal throughput : 11.111 (with `-mfma` flag)
AFTER:
LIBC reciprocal throughput : 18.617
LIBC reciprocal throughput : 12.852 (with `-msse4.2` flag)
LIBC reciprocal throughput : 9.253 (with `-mfma` flag)
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh exp2f --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency : 40.869
System LIBC latency : 30.580
BEFORE
LIBC latency : 64.888
LIBC latency : 61.027 (with `-msse4.2` flag)
LIBC latency : 48.778 (with `-mfma` flag)
AFTER
LIBC latency : 48.803
LIBC latency : 45.047 (with `-msse4.2` flag)
LIBC latency : 37.487 (with `-mfma` flag)
```
Reviewed By: sivachandra, orex
Differential Revision: https://reviews.llvm.org/D133870
Added:
Modified:
libc/docs/math.rst
libc/src/math/generic/CMakeLists.txt
libc/src/math/generic/exp2f.cpp
libc/src/math/generic/explogxf.h
libc/test/src/math/exhaustive/exp2f_test.cpp
libc/test/src/math/explogxf_test.cpp
Removed:
################################################################################
diff --git a/libc/docs/math.rst b/libc/docs/math.rst
index 55c8748b20b57..3171f01aabc33 100644
--- a/libc/docs/math.rst
+++ b/libc/docs/math.rst
@@ -219,9 +219,7 @@ Performance
+--------------+-----------+-------------------+-----------+-------------------+-------------------------------------+------------+-------------------------+--------------+---------------+
| expf | 9 | 7 | 44 | 38 | :math:`[-10, 10]` | Ryzen 1700 | Ubuntu 20.04 LTS x86_64 | Clang 12.0.0 | FMA |
+--------------+-----------+-------------------+-----------+-------------------+-------------------------------------+------------+-------------------------+--------------+---------------+
-| exp2f | 8 | 6 | 35 | 23 | :math:`[-10, 10]` | i5-1135G7 | Ubuntu 20.04 LTS x86_64 | Clang 12.0.0 | FMA |
-+ +-----------+-------------------+-----------+-------------------+-------------------------------------+------------+-------------------------+--------------+---------------+
-| | 11 | 6 | 49 | 31 | :math:`[-10, 10]` | Ryzen 1700 | Ubuntu 20.04 LTS x86_64 | Clang 12.0.0 | FMA |
+| exp2f | 9 | 6 | 37 | 31 | :math:`[-10, 10]` | Ryzen 1700 | Ubuntu 22.04 LTS x86_64 | Clang 14.0.0 | FMA |
+--------------+-----------+-------------------+-----------+-------------------+-------------------------------------+------------+-------------------------+--------------+---------------+
| expm1f | 9 | 44 | 42 | 121 | :math:`[-10, 10]` | Ryzen 1700 | Ubuntu 20.04 LTS x86_64 | Clang 12.0.0 | FMA |
+--------------+-----------+-------------------+-----------+-------------------+-------------------------------------+------------+-------------------------+--------------+---------------+
diff --git a/libc/src/math/generic/CMakeLists.txt b/libc/src/math/generic/CMakeLists.txt
index ffc06c8239445..77aaa3dd3e2a4 100644
--- a/libc/src/math/generic/CMakeLists.txt
+++ b/libc/src/math/generic/CMakeLists.txt
@@ -549,8 +549,6 @@ add_entrypoint_object(
HDRS
../exp2f.h
DEPENDS
- .common_constants
- .explogxf
libc.src.__support.FPUtil.fenv_impl
libc.src.__support.FPUtil.fp_bits
libc.src.__support.FPUtil.multiply_add
diff --git a/libc/src/math/generic/exp2f.cpp b/libc/src/math/generic/exp2f.cpp
index 5d6b6bea1f22b..913e6b743f679 100644
--- a/libc/src/math/generic/exp2f.cpp
+++ b/libc/src/math/generic/exp2f.cpp
@@ -7,8 +7,6 @@
//===----------------------------------------------------------------------===//
#include "src/math/exp2f.h"
-#include "common_constants.h"
-#include "explogxf.h"
#include "src/__support/FPUtil/FEnvImpl.h"
#include "src/__support/FPUtil/FPBits.h"
#include "src/__support/FPUtil/PolyEval.h"
@@ -24,6 +22,17 @@ constexpr uint32_t exval1 = 0x3b42'9d37U;
constexpr uint32_t exval2 = 0xbcf3'a937U;
constexpr uint32_t exval_mask = exval1 & exval2;
+// Look up table for bit fields of 2^(i/16) for i = 0..15, generated by Sollya
+// with:
+// > for i from 0 to 15 do printdouble(round(2^(i/16), D, RN));
+constexpr int64_t EXP_2_M[16] = {
+ 0x3ff0000000000000, 0x3ff0b5586cf9890f, 0x3ff172b83c7d517b,
+ 0x3ff2387a6e756238, 0x3ff306fe0a31b715, 0x3ff3dea64c123422,
+ 0x3ff4bfdad5362a27, 0x3ff5ab07dd485429, 0x3ff6a09e667f3bcd,
+ 0x3ff7a11473eb0187, 0x3ff8ace5422aa0db, 0x3ff9c49182a3f090,
+ 0x3ffae89f995ad3ad, 0x3ffc199bdd85529c, 0x3ffd5818dcfba487,
+ 0x3ffea4afa2a490da};
+
LLVM_LIBC_FUNCTION(float, exp2f, (float x)) {
using FPBits = typename fputil::FPBits<float>;
FPBits xbits(x);
@@ -31,12 +40,14 @@ LLVM_LIBC_FUNCTION(float, exp2f, (float x)) {
uint32_t x_u = xbits.uintval();
uint32_t x_abs = x_u & 0x7fff'ffffU;
- // // When |x| >= 128, |x| < 2^-25, or x is nan
- if (unlikely(x_abs >= 0x4300'0000U || x_abs <= 0x3280'0000U)) {
- // |x| < 2^-25
- if (x_abs <= 0x3280'0000U) {
- return 1.0f + x;
- }
+ // |x| < 2^-25
+ if (unlikely(x_abs <= 0x3280'0000U)) {
+ return 1.0f + x;
+ }
+
+ // // When |x| >= 128, or x is nan
+ if (unlikely(x_abs >= 0x4300'0000U)) {
+
// x >= 128
if (!xbits.get_sign()) {
// x is finite
@@ -50,7 +61,7 @@ LLVM_LIBC_FUNCTION(float, exp2f, (float x)) {
// x is +inf or nan
return x + FPBits::inf().get_val();
}
- // x < -150
+ // x <= -150
if (x_u >= 0xc316'0000U) {
// exp(-Inf) = 0
if (xbits.is_inf())
@@ -66,6 +77,7 @@ LLVM_LIBC_FUNCTION(float, exp2f, (float x)) {
}
}
+ // Check exceptional values.
if (unlikely(x_u & exval_mask) == exval_mask) {
if (unlikely(x_u == exval1)) { // x = 0x1.853a6ep-9f
if (fputil::get_round() == FE_TONEAREST)
@@ -76,7 +88,52 @@ LLVM_LIBC_FUNCTION(float, exp2f, (float x)) {
}
}
- return exp2_eval(x);
+ // For -150 < x < 128, to compute 2^x, we perform the following range
+ // reduction: find hi, mid, lo such that:
+ // x = hi + mid + lo, in which
+ // hi is an integer,
+ // 0 <= mid * 2^4 < 16 is an integer
+ // -2^(-5) <= lo <= 2^-5.
+ // In particular,
+ // hi + mid = round(x * 2^4) * 2^(-4).
+ // Then,
+ // 2^x = 2^(hi + mid + lo) = 2^hi * 2^mid * 2^lo.
+ // 2^mid is stored in the lookup table EXP_2_M of 16 elements.
+ // 2^lo is computed using a degree-6 minimax polynomial
+ // generated by Sollya.
+ // We perform 2^hi * 2^lo by simply add hi to the exponent field
+ // of 2^mid.
+
+ // kf = (hi + mid) * 2^4 = round(x * 2^4)
+ float kf = fputil::nearest_integer(x * 16.0f);
+ // dx = lo = x - (hi + mid) = x - kf * 2^(-4)
+ double dx = fputil::multiply_add(-0x1.0p-4f, kf, x);
+
+ int k = static_cast<int>(kf);
+ // hi = floor(kf * 2^(-4))
+ // exp_hi = shift hi to the exponent field of double precision.
+ int64_t exp_hi = static_cast<int64_t>(k >> 4)
+ << fputil::FloatProperties<double>::MANTISSA_WIDTH;
+ // mh = 2^hi * 2^mid
+ // mh_bits = bit field of mh
+ int64_t mh_bits = EXP_2_M[k & 15] + exp_hi;
+ double mh = fputil::FPBits<double>(uint64_t(mh_bits)).get_val();
+
+ // Degree-5 polynomial approximating (2^x - 1)/x generating by Sollya with:
+ // > P = fpminimax((2^x - 1)/x, 5, [|D...|], [-1/32. 1/32]);
+ constexpr double COEFFS[6] = {0x1.62e42fefa39f3p-1, 0x1.ebfbdff82c57bp-3,
+ 0x1.c6b08d6f2d7aap-5, 0x1.3b2ab6fc92f5dp-7,
+ 0x1.5d897cfe27125p-10, 0x1.43090e61e6af1p-13};
+ double dx_sq = dx * dx;
+ double c1 = fputil::multiply_add(dx, COEFFS[1], COEFFS[0]);
+ double c2 = fputil::multiply_add(dx, COEFFS[3], COEFFS[2]);
+ double c3 = fputil::multiply_add(dx, COEFFS[5], COEFFS[4]);
+ double p = fputil::polyeval(dx_sq, c1, c2, c3);
+ // 2^x = 2^(hi + mid + lo)
+ // = 2^(hi + mid) * 2^lo
+ // ~ mh * (1 + lo * P(lo))
+ // = mh + (mh*lo) * P(lo)
+ return fputil::multiply_add(p, dx * mh, mh);
}
} // namespace __llvm_libc
diff --git a/libc/src/math/generic/explogxf.h b/libc/src/math/generic/explogxf.h
index 0e3cf9eecbdaa..1688dde1ef749 100644
--- a/libc/src/math/generic/explogxf.h
+++ b/libc/src/math/generic/explogxf.h
@@ -98,27 +98,6 @@ inline static exe_eval_result_t exp_eval(double x) {
return {mult_e1, r};
}
-inline static double exp2_eval(double x) {
- double kf = fputil::nearest_integer(x * mlp);
- double dx = fputil::multiply_add(mmld, kf, x);
- double mult_f, ml;
- {
- uint32_t ps = static_cast<int>(kf) + (1 << (EXP_bits_p - 1)) +
- (fputil::FPBits<double>::EXPONENT_BIAS << EXP_bits_p);
- fputil::FPBits<double> bs;
- bs.set_unbiased_exponent(ps >> EXP_bits_p);
- ml = 1.0 + EXP_2_POW[ps & (EXP_num_p - 1)];
- mult_f = bs.get_val();
- }
-
- // Taylor series coefficients for 2^x
- double pe = fputil::polyeval(
- dx, 1.0, 0x1.62e42fefa39efp-1, 0x1.ebfbdff82c58fp-3, 0x1.c6b08d704a0c0p-5,
- 0x1.3b2ab6fba4e77p-7, 0x1.5d87fe78a6731p-10, 0x1.430912f86c787p-13);
-
- return mult_f * ml * pe;
-}
-
// x should be positive, normal finite value
inline static double log2_eval(double x) {
using FPB = fputil::FPBits<double>;
diff --git a/libc/test/src/math/exhaustive/exp2f_test.cpp b/libc/test/src/math/exhaustive/exp2f_test.cpp
index dfb9eef89b808..da4a2dfa085b3 100644
--- a/libc/test/src/math/exhaustive/exp2f_test.cpp
+++ b/libc/test/src/math/exhaustive/exp2f_test.cpp
@@ -32,9 +32,9 @@ struct LlvmLibcExp2fExhaustiveTest : public LlvmLibcExhaustiveTest<uint32_t> {
}
};
-// Range: [0, 128];
+// Range: [0, +Inf];
static constexpr uint32_t POS_START = 0x0000'0000U;
-static constexpr uint32_t POS_STOP = 0x4300'0000U;
+static constexpr uint32_t POS_STOP = 0x7f80'0000U;
TEST_F(LlvmLibcExp2fExhaustiveTest, PostiveRangeRoundNearestTieToEven) {
test_full_range(POS_START, POS_STOP, mpfr::RoundingMode::Nearest);
@@ -52,9 +52,9 @@ TEST_F(LlvmLibcExp2fExhaustiveTest, PostiveRangeRoundTowardZero) {
test_full_range(POS_START, POS_STOP, mpfr::RoundingMode::TowardZero);
}
-// Range: [-150, 0];
+// Range: [-Inf, 0];
static constexpr uint32_t NEG_START = 0x8000'0000U;
-static constexpr uint32_t NEG_STOP = 0xc316'0000U;
+static constexpr uint32_t NEG_STOP = 0xff80'0000U;
TEST_F(LlvmLibcExp2fExhaustiveTest, NegativeRangeRoundNearestTieToEven) {
test_full_range(NEG_START, NEG_STOP, mpfr::RoundingMode::Nearest);
diff --git a/libc/test/src/math/explogxf_test.cpp b/libc/test/src/math/explogxf_test.cpp
index cceb3d5158b4a..6aa9cb3ad0b87 100644
--- a/libc/test/src/math/explogxf_test.cpp
+++ b/libc/test/src/math/explogxf_test.cpp
@@ -39,15 +39,6 @@ TEST(LlvmLibcExpxfTest, InFloatRange) {
def_prec);
}
-TEST(LlvmLibcExp2xfTest, InFloatRange) {
- auto f_check = [](float x) -> bool {
- return !(
- (isnan(x) || isinf(x) || x < -130 || x > 130 || fabsf(x) < 0x1.0p-10));
- };
- CHECK_DATA(0.0f, neg_inf, mpfr::Operation::Exp2, __llvm_libc::exp2_eval,
- f_check, def_count, def_prec);
-}
-
TEST(LlvmLibcLog2xfTest, InFloatRange) {
CHECK_DATA(0.0f, inf, mpfr::Operation::Log2, __llvm_libc::log2_eval, f_normal,
def_count, def_prec);
More information about the libc-commits
mailing list