[libc-commits] [libc] [libc] Improve GPU benchmarking (PR #153512)
Leandro Lacerda via libc-commits
libc-commits at lists.llvm.org
Wed Aug 13 17:33:16 PDT 2025
leandrolcampos wrote:
**Preliminary Results** (NVIDIA GeForce RTX 4070 Laptop GPU)
```bash
[1/4] Running hermetic test libc.benchmarks.gpu.src.ctype.isalnum_benchmark
Running Suite: LlvmLibcIsAlNumGpuBenchmark
Benchmark | Cycles | Min | Max | Iterations | Time / Iteration | Stddev | Threads |
--------------------------------------------------------------------------------------------------------------
IsAlnum | 53 | 53 | 53 | 156 | 3 us | 0 | 64 |
IsAlnumSingleThread | 53 | 53 | 53 | 157 | 3 us | 0 | 1 |
IsAlnumSingleWave | 53 | 53 | 53 | 155 | 3 us | 0 | 32 |
IsAlnumCapital | 53 | 53 | 53 | 157 | 3 us | 0 | 64 |
IsAlnumNotAlnum | 43 | 43 | 43 | 163 | 3 us | 0 | 64 |
[2/4] Running hermetic test libc.benchmarks.gpu.src.ctype.isalpha_benchmark
Running Suite: LlvmLibcIsAlphaGpuBenchmark
Benchmark | Cycles | Min | Max | Iterations | Time / Iteration | Stddev | Threads |
--------------------------------------------------------------------------------------------------------------
IsAlpha | 53 | 53 | 53 | 156 | 3 us | 0 | 1 |
[3/4] Running hermetic test libc.benchmarks.gpu.src.math.sin_benchmark
Running Suite: LlvmLibcSinGpuBenchmark
Benchmark | Cycles | Min | Max | Iterations | Time / Iteration | Stddev | Threads |
--------------------------------------------------------------------------------------------------------------
Sin_1 | 3087 | 2946 | 3637 | 202 | 17 us | 159 | 32 |
Sin_128 | 362 | 354 | 372 | 26 | 64 us | 5 | 32 |
Sin_1024 | 352 | 348 | 358 | 23 | 405 us | 2 | 32 |
Sin_4096 | 359 | 358 | 361 | 7 | 1 ms | 1 | 32 |
SinTwoPi_1 | 2205 | 2186 | 2506 | 29 | 17 us | 56 | 32 |
SinTwoPi_128 | 262 | 259 | 267 | 10 | 52 us | 2 | 32 |
SinTwoPi_1024 | 271 | 271 | 275 | 16 | 319 us | 0 | 32 |
SinTwoPi_4096 | 280 | 280 | 281 | 9 | 1 ms | 0 | 32 |
SinTwoPow30_1 | 3104 | 3086 | 3174 | 28 | 18 us | 16 | 32 |
SinTwoPow30_128 | 348 | 345 | 352 | 9 | 60 us | 1 | 32 |
SinTwoPow30_1024 | 358 | 357 | 359 | 7 | 380 us | 0 | 32 |
SinTwoPow30_4096 | 366 | 366 | 367 | 6 | 1 ms | 0 | 32 |
SinVeryLarge_1 | 2827 | 2788 | 3069 | 29 | 17 us | 46 | 32 |
SinVeryLarge_128 | 316 | 313 | 318 | 14 | 57 us | 1 | 32 |
SinVeryLarge_1024 | 316 | 315 | 320 | 16 | 348 us | 1 | 32 |
SinVeryLarge_4096 | 324 | 323 | 325 | 15 | 1 ms | 0 | 32 |
NvSin_1 | 2507 | 2262 | 2890 | 39 | 15 us | 95 | 32 |
NvSin_128 | 1862 | 1858 | 1870 | 5 | 145 us | 4 | 32 |
NvSin_1024 | 2066 | 2066 | 2068 | 5 | 1 ms | 0 | 32 |
NvSin_4096 | 2085 | 2085 | 2085 | 4 | 4 ms | 0 | 32 |
NvSinTwoPi_1 | 1103 | 1102 | 1105 | 35 | 14 us | 0 | 32 |
NvSinTwoPi_128 | 925 | 925 | 927 | 7 | 82 us | 0 | 32 |
NvSinTwoPi_1024 | 1134 | 1134 | 1134 | 4 | 665 us | 0 | 32 |
NvSinTwoPi_4096 | 1153 | 1153 | 1153 | 4 | 2 ms | 0 | 32 |
NvSinTwoPow30_1 | 1103 | 1102 | 1104 | 35 | 14 us | 0 | 32 |
NvSinTwoPow30_128 | 925 | 925 | 925 | 7 | 82 us | 0 | 32 |
NvSinTwoPow30_1024 | 1134 | 1134 | 1134 | 4 | 668 us | 0 | 32 |
NvSinTwoPow30_4096 | 1153 | 1153 | 1153 | 4 | 2 ms | 0 | 32 |
NvSinVeryLarge_1 | 2493 | 2470 | 2795 | 38 | 15 us | 50 | 32 |
NvSinVeryLarge_128 | 1827 | 1827 | 1829 | 5 | 141 us | 0 | 32 |
NvSinVeryLarge_1024 | 2033 | 2033 | 2034 | 5 | 1 ms | 0 | 32 |
NvSinVeryLarge_4096 | 2050 | 2050 | 2050 | 4 | 4 ms | 0 | 32 |
Sinf_1 | 2190 | 1524 | 2396 | 527 | 14 us | 174 | 32 |
Sinf_128 | 239 | 229 | 247 | 26 | 40 us | 4 | 32 |
Sinf_1024 | 241 | 236 | 249 | 8 | 233 us | 3 | 32 |
Sinf_4096 | 259 | 258 | 261 | 8 | 905 us | 1 | 32 |
SinfTwoPi_1 | 1447 | 1430 | 1753 | 39 | 14 us | 49 | 32 |
SinfTwoPi_128 | 147 | 146 | 149 | 19 | 34 us | 0 | 32 |
SinfTwoPi_1024 | 146 | 145 | 148 | 13 | 183 us | 0 | 32 |
SinfTwoPi_4096 | 165 | 165 | 167 | 23 | 704 us | 0 | 32 |
SinfTwoPow30_1 | 1084 | 1078 | 1163 | 35 | 14 us | 13 | 32 |
SinfTwoPow30_128 | 102 | 101 | 104 | 32 | 32 us | 0 | 32 |
SinfTwoPow30_1024 | 102 | 102 | 103 | 25 | 164 us | 0 | 32 |
SinfTwoPow30_4096 | 121 | 121 | 123 | 17 | 645 us | 0 | 32 |
SinfVeryLarge_1 | 1930 | 1870 | 2268 | 34 | 15 us | 59 | 32 |
SinfVeryLarge_128 | 205 | 205 | 207 | 18 | 38 us | 0 | 32 |
SinfVeryLarge_1024 | 205 | 205 | 207 | 10 | 218 us | 0 | 32 |
SinfVeryLarge_4096 | 224 | 224 | 226 | 14 | 845 us | 0 | 32 |
NvSinf_1 | 1020 | 1016 | 1032 | 37 | 13 us | 5 | 32 |
NvSinf_128 | 786 | 786 | 788 | 7 | 76 us | 0 | 32 |
NvSinf_1024 | 974 | 969 | 976 | 17 | 588 us | 2 | 32 |
NvSinf_4096 | 1008 | 1008 | 1009 | 4 | 2 ms | 0 | 32 |
NvSinfTwoPi_1 | 164 | 162 | 505 | 145 | 13 us | 28 | 32 |
NvSinfTwoPi_128 | 141 | 141 | 143 | 15 | 33 us | 0 | 32 |
NvSinfTwoPi_1024 | 330 | 330 | 331 | 7 | 272 us | 0 | 32 |
NvSinfTwoPi_4096 | 364 | 364 | 365 | 6 | 1 ms | 0 | 32 |
NvSinfTwoPow30_1 | 1024 | 1016 | 1272 | 64 | 14 us | 31 | 32 |
NvSinfTwoPow30_128 | 776 | 776 | 776 | 7 | 73 us | 0 | 32 |
NvSinfTwoPow30_1024 | 968 | 966 | 969 | 7 | 504 us | 1 | 32 |
NvSinfTwoPow30_4096 | 1002 | 1002 | 1002 | 4 | 1 ms | 0 | 32 |
NvSinfVeryLarge_1 | 1003 | 1001 | 1026 | 39 | 13 us | 3 | 32 |
NvSinfVeryLarge_128 | 758 | 758 | 758 | 9 | 60 us | 0 | 32 |
NvSinfVeryLarge_1024 | 950 | 950 | 951 | 4 | 478 us | 0 | 32 |
NvSinfVeryLarge_4096 | 983 | 983 | 984 | 4 | 1 ms | 0 | 32 |
[4/4] Running hermetic test libc.benchmarks.gpu.src.math.atan2_benchmark
Running Suite: LlvmLibcAtan2GpuBenchmark
Benchmark | Cycles | Min | Max | Iterations | Time / Iteration | Stddev | Threads |
--------------------------------------------------------------------------------------------------------------
Atan2_1 | 4082 | 1894 | 5241 | 723 | 14 us | 953 | 32 |
Atan2_128 | 2520 | 2454 | 2580 | 21 | 165 us | 33 | 32 |
Atan2_1024 | 2745 | 2723 | 2768 | 11 | 1 ms | 13 | 32 |
Atan2_4096 | 2750 | 2739 | 2761 | 11 | 5 ms | 6 | 32 |
Atan2TwoPi_1 | 2749 | 2731 | 3160 | 36 | 14 us | 69 | 32 |
Atan2TwoPi_128 | 1072 | 1065 | 1097 | 10 | 82 us | 8 | 32 |
Atan2TwoPi_1024 | 1302 | 1301 | 1304 | 4 | 668 us | 1 | 32 |
Atan2TwoPi_4096 | 1303 | 1303 | 1303 | 4 | 2 ms | 0 | 32 |
Atan2TwoPow30_1 | 2744 | 2729 | 3177 | 39 | 13 us | 70 | 32 |
Atan2TwoPow30_128 | 1075 | 1069 | 1101 | 10 | 84 us | 8 | 32 |
Atan2TwoPow30_1024 | 1302 | 1302 | 1304 | 4 | 677 us | 0 | 32 |
Atan2TwoPow30_4096 | 1303 | 1303 | 1304 | 4 | 2 ms | 0 | 32 |
Atan2Large_1 | 3577 | 1125 | 3888 | 142 | 14 us | 361 | 32 |
Atan2Large_128 | 1810 | 1770 | 1841 | 12 | 124 us | 17 | 32 |
Atan2Large_1024 | 2053 | 2050 | 2057 | 5 | 973 us | 2 | 32 |
Atan2Large_4096 | 2051 | 2047 | 2054 | 8 | 3 ms | 2 | 32 |
NvAtan2_1 | 2911 | 2866 | 3324 | 56 | 14 us | 64 | 32 |
NvAtan2_128 | 2838 | 2834 | 2849 | 6 | 180 us | 5 | 32 |
NvAtan2_1024 | 3075 | 3075 | 3077 | 4 | 1 ms | 0 | 32 |
NvAtan2_4096 | 3076 | 3076 | 3076 | 4 | 5 ms | 0 | 32 |
NvAtan2TwoPi_1 | 2040 | 2032 | 2382 | 42 | 13 us | 53 | 32 |
NvAtan2TwoPi_128 | 1980 | 1979 | 1993 | 9 | 130 us | 4 | 32 |
NvAtan2TwoPi_1024 | 2219 | 2219 | 2219 | 4 | 1 ms | 0 | 32 |
NvAtan2TwoPi_4096 | 2219 | 2219 | 2219 | 4 | 4 ms | 0 | 32 |
NvAtan2TwoPow30_1 | 2035 | 2032 | 2183 | 38 | 13 us | 24 | 32 |
NvAtan2TwoPow30_128 | 1980 | 1979 | 1993 | 9 | 132 us | 4 | 32 |
NvAtan2TwoPow30_1024 | 2218 | 2218 | 2219 | 5 | 1 ms | 0 | 32 |
NvAtan2TwoPow30_4096 | 2219 | 2219 | 2219 | 4 | 4 ms | 0 | 32 |
NvAtan2Large_1 | 2039 | 2032 | 2356 | 41 | 13 us | 49 | 32 |
NvAtan2Large_128 | 1980 | 1979 | 1998 | 11 | 132 us | 5 | 32 |
NvAtan2Large_1024 | 2218 | 2218 | 2219 | 4 | 1 ms | 0 | 32 |
NvAtan2Large_4096 | 2219 | 2219 | 2220 | 4 | 4 ms | 0 | 32 |
```
https://github.com/llvm/llvm-project/pull/153512
More information about the libc-commits
mailing list