[all-commits] [llvm/llvm-project] 82d6e7: [libc] Implement tanf function correctly rounded f...

Fri Aug 12 06:21:27 PDT 2022

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 82d6e7704899ee4f9265680e38465f6e4da9c223
      https://github.com/llvm/llvm-project/commit/82d6e7704899ee4f9265680e38465f6e4da9c223
  Author: Tue Ly <lntue at google.com>
  Date:   2022-08-12 (Fri, 12 Aug 2022)

  Changed paths:
    M libc/config/darwin/arm/entrypoints.txt
    M libc/config/linux/aarch64/entrypoints.txt
    M libc/config/linux/x86_64/entrypoints.txt
    M libc/docs/math.rst
    M libc/src/math/CMakeLists.txt
    M libc/src/math/generic/CMakeLists.txt
    M libc/src/math/generic/sincosf_utils.h
    A libc/src/math/generic/tanf.cpp
    A libc/src/math/tanf.h
    M libc/test/src/math/CMakeLists.txt
    M libc/test/src/math/exhaustive/CMakeLists.txt
    A libc/test/src/math/exhaustive/tanf_test.cpp
    A libc/test/src/math/tanf_test.cpp

  Log Message:
  -----------
  [libc] Implement tanf function correctly rounded for all rounding modes.

Implement tanf function correctly rounded for all rounding modes.

We use the range reduction that is shared with `sinf`, `cosf`, and `sincosf`:
```
  k = round(x * 32/pi) and y = x * (32/pi) - k.
```
Then we use the tangent of sum formula:
```
  tan(x) = tan((k + y)* pi/32) = tan((k mod 32) * pi / 32 + y * pi/32)
         = (tan((k mod 32) * pi/32) + tan(y * pi/32)) / (1 - tan((k mod 32) * pi/32) * tan(y * pi/32))
```
We need to make a further reduction when `k mod 32 >= 16` due to the pole at `pi/2` of `tan(x)` function:
```
  if (k mod 32 >= 16): k = k - 31, y = y - 1.0
```
And to compute the final result, we store `tan(k * pi/32)` for `k = -15..15` in a table of 32 double values,
and evaluate `tan(y * pi/32)` with a degree-11 minimax odd polynomial generated by Sollya with:
```
>  P = fpminimax(tan(y * pi/32)/y, [|0, 2, 4, 6, 8, 10|], [|D...|], [0, 1.5]);
```

Performance benchmark using `perf` tool from the CORE-MATH project on Ryzen 1700:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh tanf
CORE-MATH reciprocal throughput   : 18.586
System LIBC reciprocal throughput : 50.068

LIBC reciprocal throughput        : 33.823
LIBC reciprocal throughput        : 25.161     (with `-msse4.2` flag)
LIBC reciprocal throughput        : 19.157     (with `-mfma` flag)

$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh tanf --latency
GNU libc version: 2.31
GNU libc release: stable
CORE-MATH latency   : 55.630
System LIBC latency : 106.264

LIBC latency        : 96.060
LIBC latency        : 90.727    (with `-msse4.2` flag)
LIBC latency        : 82.361    (with `-mfma` flag)
```

Reviewed By: orex

Differential Revision: https://reviews.llvm.org/D131715