[PATCH] D48879: [XRay][test-suite] Benchmarks for profiling mode implementation
Dean Michael Berris via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 12 00:30:03 PDT 2018
dberris added a comment.
For reference, here's the benchmark run against the current state of head (deep-call-bench):
Run on (48 X 3500 MHz CPU s)
2018-07-12 17:21:44
---------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------------------------------
BM_XRayProfilingDeepCallStack/1/real_time/threads:1 196 ns 196 ns 3548155
BM_XRayProfilingDeepCallStack/1/real_time/threads:2 148 ns 295 ns 5404008
BM_XRayProfilingDeepCallStack/1/real_time/threads:4 76 ns 305 ns 8427916
BM_XRayProfilingDeepCallStack/1/real_time/threads:8 45 ns 357 ns 11881512
BM_XRayProfilingDeepCallStack/1/real_time/threads:16 119 ns 1907 ns 7859872
BM_XRayProfilingDeepCallStack/1/real_time/threads:32 2446 ns 78267 ns 304480
BM_XRayProfilingDeepCallStack/2/real_time/threads:1 288 ns 288 ns 2294615
BM_XRayProfilingDeepCallStack/2/real_time/threads:2 196 ns 391 ns 3854258
BM_XRayProfilingDeepCallStack/2/real_time/threads:4 101 ns 405 ns 6306804
BM_XRayProfilingDeepCallStack/2/real_time/threads:8 77 ns 616 ns 9896592
BM_XRayProfilingDeepCallStack/2/real_time/threads:16 53 ns 855 ns 13341120
BM_XRayProfilingDeepCallStack/2/real_time/threads:32 3000 ns 95820 ns 241472
BM_XRayProfilingDeepCallStack/4/real_time/threads:1 479 ns 479 ns 1369221
BM_XRayProfilingDeepCallStack/4/real_time/threads:2 283 ns 566 ns 2439036
BM_XRayProfilingDeepCallStack/4/real_time/threads:4 158 ns 630 ns 4261736
BM_XRayProfilingDeepCallStack/4/real_time/threads:8 125 ns 998 ns 7218600
BM_XRayProfilingDeepCallStack/4/real_time/threads:16 74 ns 1190 ns 6930480
BM_XRayProfilingDeepCallStack/4/real_time/threads:32 3271 ns 94646 ns 165088
BM_XRayProfilingDeepCallStack/8/real_time/threads:1 847 ns 847 ns 801600
BM_XRayProfilingDeepCallStack/8/real_time/threads:2 500 ns 1000 ns 1309944
BM_XRayProfilingDeepCallStack/8/real_time/threads:4 261 ns 1043 ns 2147344
BM_XRayProfilingDeepCallStack/8/real_time/threads:8 147 ns 1174 ns 4556248
BM_XRayProfilingDeepCallStack/8/real_time/threads:16 102 ns 1634 ns 5224688
BM_XRayProfilingDeepCallStack/8/real_time/threads:32 5324 ns 155831 ns 101920
BM_XRayProfilingDeepCallStack/16/real_time/threads:1 1588 ns 1588 ns 434210
BM_XRayProfilingDeepCallStack/16/real_time/threads:2 894 ns 1787 ns 715548
BM_XRayProfilingDeepCallStack/16/real_time/threads:4 490 ns 1959 ns 1230536
BM_XRayProfilingDeepCallStack/16/real_time/threads:8 277 ns 2213 ns 2229616
BM_XRayProfilingDeepCallStack/16/real_time/threads:16 585 ns 9366 ns 923344
BM_XRayProfilingDeepCallStack/16/real_time/threads:32 16064 ns 513978 ns 59392
BM_XRayProfilingDeepCallStack/32/real_time/threads:1 3247 ns 3247 ns 215805
BM_XRayProfilingDeepCallStack/32/real_time/threads:2 1692 ns 3384 ns 409380
BM_XRayProfilingDeepCallStack/32/real_time/threads:4 925 ns 3700 ns 677716
BM_XRayProfilingDeepCallStack/32/real_time/threads:8 512 ns 4095 ns 1240360
BM_XRayProfilingDeepCallStack/32/real_time/threads:16 513 ns 8206 ns 1681168
BM_XRayProfilingDeepCallStack/32/real_time/threads:32 59676 ns 1909406 ns 15136
BM_XRayProfilingDeepCallStack/64/real_time/threads:1 6374 ns 6373 ns 105030
BM_XRayProfilingDeepCallStack/64/real_time/threads:2 3341 ns 6681 ns 174912
BM_XRayProfilingDeepCallStack/64/real_time/threads:4 1851 ns 7404 ns 278224
BM_XRayProfilingDeepCallStack/64/real_time/threads:8 1003 ns 8026 ns 590256
BM_XRayProfilingDeepCallStack/64/real_time/threads:16 796 ns 12726 ns 701920
BM_XRayProfilingDeepCallStack/64/real_time/threads:32 38176 ns 1152947 ns 16608
With the changes in https://reviews.llvm.org/D49217, we get:
Run on (48 X 3500 MHz CPU s)
2018-07-12 17:01:35
---------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------------------------------
BM_XRayProfilingDeepCallStack/1/real_time/threads:1 202 ns 202 ns 3477313
BM_XRayProfilingDeepCallStack/1/real_time/threads:2 179 ns 357 ns 4581178
BM_XRayProfilingDeepCallStack/1/real_time/threads:4 144 ns 577 ns 5875828
BM_XRayProfilingDeepCallStack/1/real_time/threads:8 125 ns 1003 ns 8311456
BM_XRayProfilingDeepCallStack/1/real_time/threads:16 174 ns 2792 ns 9522368
BM_XRayProfilingDeepCallStack/1/real_time/threads:32 146 ns 4687 ns 6358400
BM_XRayProfilingDeepCallStack/2/real_time/threads:1 295 ns 295 ns 2359216
BM_XRayProfilingDeepCallStack/2/real_time/threads:2 239 ns 478 ns 2888910
BM_XRayProfilingDeepCallStack/2/real_time/threads:4 160 ns 638 ns 5410336
BM_XRayProfilingDeepCallStack/2/real_time/threads:8 125 ns 999 ns 7721696
BM_XRayProfilingDeepCallStack/2/real_time/threads:16 103 ns 1647 ns 5126384
BM_XRayProfilingDeepCallStack/2/real_time/threads:32 131 ns 4153 ns 5427136
BM_XRayProfilingDeepCallStack/4/real_time/threads:1 490 ns 490 ns 1326060
BM_XRayProfilingDeepCallStack/4/real_time/threads:2 367 ns 734 ns 2276550
BM_XRayProfilingDeepCallStack/4/real_time/threads:4 249 ns 994 ns 3981604
BM_XRayProfilingDeepCallStack/4/real_time/threads:8 174 ns 1394 ns 5467368
BM_XRayProfilingDeepCallStack/4/real_time/threads:16 129 ns 2057 ns 4399568
BM_XRayProfilingDeepCallStack/4/real_time/threads:32 148 ns 4718 ns 4695104
BM_XRayProfilingDeepCallStack/8/real_time/threads:1 873 ns 873 ns 788744
BM_XRayProfilingDeepCallStack/8/real_time/threads:2 535 ns 1071 ns 1177912
BM_XRayProfilingDeepCallStack/8/real_time/threads:4 339 ns 1354 ns 2235540
BM_XRayProfilingDeepCallStack/8/real_time/threads:8 256 ns 2051 ns 3818424
BM_XRayProfilingDeepCallStack/8/real_time/threads:16 208 ns 3323 ns 4687040
BM_XRayProfilingDeepCallStack/8/real_time/threads:32 211 ns 6751 ns 3579136
BM_XRayProfilingDeepCallStack/16/real_time/threads:1 1652 ns 1652 ns 414737
BM_XRayProfilingDeepCallStack/16/real_time/threads:2 975 ns 1950 ns 785698
BM_XRayProfilingDeepCallStack/16/real_time/threads:4 601 ns 2402 ns 1400136
BM_XRayProfilingDeepCallStack/16/real_time/threads:8 365 ns 2918 ns 2308440
BM_XRayProfilingDeepCallStack/16/real_time/threads:16 313 ns 5003 ns 1600000
BM_XRayProfilingDeepCallStack/16/real_time/threads:32 256 ns 8177 ns 3033056
BM_XRayProfilingDeepCallStack/32/real_time/threads:1 3419 ns 3418 ns 209959
BM_XRayProfilingDeepCallStack/32/real_time/threads:2 1858 ns 3716 ns 405304
BM_XRayProfilingDeepCallStack/32/real_time/threads:4 1051 ns 4204 ns 690604
BM_XRayProfilingDeepCallStack/32/real_time/threads:8 611 ns 4890 ns 1233168
BM_XRayProfilingDeepCallStack/32/real_time/threads:16 425 ns 6798 ns 1634992
BM_XRayProfilingDeepCallStack/32/real_time/threads:32 336 ns 10737 ns 1958368
BM_XRayProfilingDeepCallStack/64/real_time/threads:1 6438 ns 6438 ns 105337
BM_XRayProfilingDeepCallStack/64/real_time/threads:2 3432 ns 6864 ns 197488
BM_XRayProfilingDeepCallStack/64/real_time/threads:4 2477 ns 9906 ns 376460
BM_XRayProfilingDeepCallStack/64/real_time/threads:8 1069 ns 8547 ns 578224
BM_XRayProfilingDeepCallStack/64/real_time/threads:16 684 ns 10949 ns 1079040
BM_XRayProfilingDeepCallStack/64/real_time/threads:32 482 ns 15417 ns 1298176
One thing to notice here is the non-linear (super-linear?) scaling on the number of iterations and overheads. This is good evidence to support the changes in https://reviews.llvm.org/D49217, and allows us to be more confident in qualifying the costs/overheads of the profiling mode implementation.
Repository:
rT test-suite
https://reviews.llvm.org/D48879
More information about the llvm-commits
mailing list