[all-commits] [llvm/llvm-project] 111d27: [libc][math] Implement double precision log2 funct...
lntue via All-commits
all-commits at lists.llvm.org
Tue May 23 07:49:51 PDT 2023
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 111d27484132c0692c214880576dc4a37fd6d645
https://github.com/llvm/llvm-project/commit/111d27484132c0692c214880576dc4a37fd6d645
Author: Tue Ly <lntue at google.com>
Date: 2023-05-23 (Tue, 23 May 2023)
Changed paths:
M libc/config/darwin/arm/entrypoints.txt
M libc/config/linux/aarch64/entrypoints.txt
M libc/config/linux/x86_64/entrypoints.txt
M libc/config/windows/entrypoints.txt
M libc/spec/stdc.td
M libc/src/math/CMakeLists.txt
M libc/src/math/generic/CMakeLists.txt
A libc/src/math/generic/log2.cpp
A libc/src/math/log2.h
M libc/src/string/memory_utils/utils.h
M libc/test/src/math/CMakeLists.txt
A libc/test/src/math/log2_test.cpp
M libc/test/src/string/memory_utils/utils_test.cpp
Log Message:
-----------
[libc][math] Implement double precision log2 function correctly rounded to all rounding modes.
Implement double precision log2 function correctly rounded to all
rounding modes.
See https://reviews.llvm.org/D150014 for a more detail description of the algorithm.
**Performance**
- For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.91%.
- Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log2
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 15.458 + 0.204 clc/call; Median-Min = 0.224 clc/call; Max = 15.867 clc/call;
-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 23.711 + 0.524 clc/call; Median-Min = 0.443 clc/call; Max = 25.307 clc/call;
-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 14.807 + 0.199 clc/call; Median-Min = 0.211 clc/call; Max = 15.137 clc/call;
-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 17.666 + 0.274 clc/call; Median-Min = 0.298 clc/call; Max = 18.531 clc/call;
-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 26.534 + 0.418 clc/call; Median-Min = 0.462 clc/call; Max = 27.327 clc/call;
```
- Latency from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log2 --latency
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 46.048 + 1.643 clc/call; Median-Min = 1.694 clc/call; Max = 48.018 clc/call;
-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 62.333 + 0.138 clc/call; Median-Min = 0.119 clc/call; Max = 62.583 clc/call;
-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 45.206 + 1.503 clc/call; Median-Min = 1.467 clc/call; Max = 47.229 clc/call;
-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 43.042 + 0.454 clc/call; Median-Min = 0.484 clc/call; Max = 43.912 clc/call;
-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 57.016 + 1.636 clc/call; Median-Min = 1.655 clc/call; Max = 58.816 clc/call;
```
- Accurate pass latency:
```
$ ./perf.sh log2 --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH latency -- with FMA
177.632
-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
231.332
-- LIBC latency -- with FMA
459.751
-- LIBC latency -- without FMA
463.850
```
Reviewed By: zimmermann6
Differential Revision: https://reviews.llvm.org/D150374
More information about the All-commits
mailing list