[all-commits] [llvm/llvm-project] b91e78: [libc][math] Implement double precision log1p corr...
lntue via All-commits
all-commits at lists.llvm.org
Tue May 23 08:04:30 PDT 2023
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: b91e78da37800c03542141135a6628f320e974ed
https://github.com/llvm/llvm-project/commit/b91e78da37800c03542141135a6628f320e974ed
Author: Tue Ly <lntue at google.com>
Date: 2023-05-23 (Tue, 23 May 2023)
Changed paths:
M libc/config/darwin/arm/entrypoints.txt
M libc/config/linux/aarch64/entrypoints.txt
M libc/config/linux/x86_64/entrypoints.txt
M libc/config/windows/entrypoints.txt
M libc/spec/stdc.td
M libc/src/math/CMakeLists.txt
M libc/src/math/generic/CMakeLists.txt
A libc/src/math/generic/log1p.cpp
A libc/src/math/log1p.h
M libc/test/src/math/CMakeLists.txt
A libc/test/src/math/log1p_test.cpp
Log Message:
-----------
[libc][math] Implement double precision log1p correctly rounded to all rounding modes.
Implement double precision log1p function correctly rounded to all
rounding modes.
**Performance**
- For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.93%.
- Benchmarks with `./perf.sh` tool from the CORE-MATH project, unit is (CPU clocks / call).
- Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log1p
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 39.792 + 1.011 clc/call; Median-Min = 0.940 clc/call; Max = 41.373 clc/call;
-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 87.285 + 1.135 clc/call; Median-Min = 1.299 clc/call; Max = 89.715 clc/call;
-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 20.666 + 0.123 clc/call; Median-Min = 0.125 clc/call; Max = 20.828 clc/call;
-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 20.928 + 0.771 clc/call; Median-Min = 0.725 clc/call; Max = 22.767 clc/call;
-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 31.461 + 0.528 clc/call; Median-Min = 0.602 clc/call; Max = 36.809 clc/call;
```
- Latency from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log1p --latency
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 77.875 + 0.062 clc/call; Median-Min = 0.051 clc/call; Max = 78.003 clc/call;
-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 101.958 + 1.202 clc/call; Median-Min = 1.325 clc/call; Max = 104.452 clc/call;
-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 60.581 + 1.443 clc/call; Median-Min = 1.611 clc/call; Max = 62.285 clc/call;
-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 48.817 + 1.108 clc/call; Median-Min = 1.300 clc/call; Max = 50.282 clc/call;
-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 61.121 + 0.599 clc/call; Median-Min = 0.761 clc/call; Max = 62.020 clc/call;
```
- Accurate pass latency:
```
$ ./perf.sh log1p --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH latency -- with FMA
760.444
-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
827.880
-- LIBC latency -- with FMA
711.837
-- LIBC latency -- without FMA
764.317
```
Reviewed By: zimmermann6
Differential Revision: https://reviews.llvm.org/D151049
More information about the All-commits
mailing list