[PATCH] D48879: [XRay][test-suite] Benchmarks for profiling mode implementation

Dean Michael Berris via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jul 12 00:30:03 PDT 2018


dberris added a comment.

For reference, here's the benchmark run against the current state of head (deep-call-bench):

  Run on (48 X 3500 MHz CPU s)
  2018-07-12 17:21:44
  ---------------------------------------------------------------------------------------------
  Benchmark                                                      Time           CPU Iterations
  ---------------------------------------------------------------------------------------------
  BM_XRayProfilingDeepCallStack/1/real_time/threads:1          196 ns        196 ns    3548155
  BM_XRayProfilingDeepCallStack/1/real_time/threads:2          148 ns        295 ns    5404008
  BM_XRayProfilingDeepCallStack/1/real_time/threads:4           76 ns        305 ns    8427916
  BM_XRayProfilingDeepCallStack/1/real_time/threads:8           45 ns        357 ns   11881512
  BM_XRayProfilingDeepCallStack/1/real_time/threads:16         119 ns       1907 ns    7859872
  BM_XRayProfilingDeepCallStack/1/real_time/threads:32        2446 ns      78267 ns     304480
  BM_XRayProfilingDeepCallStack/2/real_time/threads:1          288 ns        288 ns    2294615
  BM_XRayProfilingDeepCallStack/2/real_time/threads:2          196 ns        391 ns    3854258
  BM_XRayProfilingDeepCallStack/2/real_time/threads:4          101 ns        405 ns    6306804
  BM_XRayProfilingDeepCallStack/2/real_time/threads:8           77 ns        616 ns    9896592
  BM_XRayProfilingDeepCallStack/2/real_time/threads:16          53 ns        855 ns   13341120
  BM_XRayProfilingDeepCallStack/2/real_time/threads:32        3000 ns      95820 ns     241472
  BM_XRayProfilingDeepCallStack/4/real_time/threads:1          479 ns        479 ns    1369221
  BM_XRayProfilingDeepCallStack/4/real_time/threads:2          283 ns        566 ns    2439036
  BM_XRayProfilingDeepCallStack/4/real_time/threads:4          158 ns        630 ns    4261736
  BM_XRayProfilingDeepCallStack/4/real_time/threads:8          125 ns        998 ns    7218600
  BM_XRayProfilingDeepCallStack/4/real_time/threads:16          74 ns       1190 ns    6930480
  BM_XRayProfilingDeepCallStack/4/real_time/threads:32        3271 ns      94646 ns     165088
  BM_XRayProfilingDeepCallStack/8/real_time/threads:1          847 ns        847 ns     801600
  BM_XRayProfilingDeepCallStack/8/real_time/threads:2          500 ns       1000 ns    1309944
  BM_XRayProfilingDeepCallStack/8/real_time/threads:4          261 ns       1043 ns    2147344
  BM_XRayProfilingDeepCallStack/8/real_time/threads:8          147 ns       1174 ns    4556248
  BM_XRayProfilingDeepCallStack/8/real_time/threads:16         102 ns       1634 ns    5224688
  BM_XRayProfilingDeepCallStack/8/real_time/threads:32        5324 ns     155831 ns     101920
  BM_XRayProfilingDeepCallStack/16/real_time/threads:1        1588 ns       1588 ns     434210
  BM_XRayProfilingDeepCallStack/16/real_time/threads:2         894 ns       1787 ns     715548
  BM_XRayProfilingDeepCallStack/16/real_time/threads:4         490 ns       1959 ns    1230536
  BM_XRayProfilingDeepCallStack/16/real_time/threads:8         277 ns       2213 ns    2229616
  BM_XRayProfilingDeepCallStack/16/real_time/threads:16        585 ns       9366 ns     923344
  BM_XRayProfilingDeepCallStack/16/real_time/threads:32      16064 ns     513978 ns      59392
  BM_XRayProfilingDeepCallStack/32/real_time/threads:1        3247 ns       3247 ns     215805
  BM_XRayProfilingDeepCallStack/32/real_time/threads:2        1692 ns       3384 ns     409380
  BM_XRayProfilingDeepCallStack/32/real_time/threads:4         925 ns       3700 ns     677716
  BM_XRayProfilingDeepCallStack/32/real_time/threads:8         512 ns       4095 ns    1240360
  BM_XRayProfilingDeepCallStack/32/real_time/threads:16        513 ns       8206 ns    1681168
  BM_XRayProfilingDeepCallStack/32/real_time/threads:32      59676 ns    1909406 ns      15136
  BM_XRayProfilingDeepCallStack/64/real_time/threads:1        6374 ns       6373 ns     105030
  BM_XRayProfilingDeepCallStack/64/real_time/threads:2        3341 ns       6681 ns     174912
  BM_XRayProfilingDeepCallStack/64/real_time/threads:4        1851 ns       7404 ns     278224
  BM_XRayProfilingDeepCallStack/64/real_time/threads:8        1003 ns       8026 ns     590256
  BM_XRayProfilingDeepCallStack/64/real_time/threads:16        796 ns      12726 ns     701920
  BM_XRayProfilingDeepCallStack/64/real_time/threads:32      38176 ns    1152947 ns      16608

With the changes in https://reviews.llvm.org/D49217, we get:

  Run on (48 X 3500 MHz CPU s)                                                                
  2018-07-12 17:01:35                                                                         
  ---------------------------------------------------------------------------------------------
  Benchmark                                                      Time           CPU Iterations
  ---------------------------------------------------------------------------------------------
  BM_XRayProfilingDeepCallStack/1/real_time/threads:1          202 ns        202 ns    3477313
  BM_XRayProfilingDeepCallStack/1/real_time/threads:2          179 ns        357 ns    4581178
  BM_XRayProfilingDeepCallStack/1/real_time/threads:4          144 ns        577 ns    5875828
  BM_XRayProfilingDeepCallStack/1/real_time/threads:8          125 ns       1003 ns    8311456                               
  BM_XRayProfilingDeepCallStack/1/real_time/threads:16         174 ns       2792 ns    9522368
  BM_XRayProfilingDeepCallStack/1/real_time/threads:32         146 ns       4687 ns    6358400
  BM_XRayProfilingDeepCallStack/2/real_time/threads:1          295 ns        295 ns    2359216
  BM_XRayProfilingDeepCallStack/2/real_time/threads:2          239 ns        478 ns    2888910
  BM_XRayProfilingDeepCallStack/2/real_time/threads:4          160 ns        638 ns    5410336
  BM_XRayProfilingDeepCallStack/2/real_time/threads:8          125 ns        999 ns    7721696
  BM_XRayProfilingDeepCallStack/2/real_time/threads:16         103 ns       1647 ns    5126384
  BM_XRayProfilingDeepCallStack/2/real_time/threads:32         131 ns       4153 ns    5427136
  BM_XRayProfilingDeepCallStack/4/real_time/threads:1          490 ns        490 ns    1326060
  BM_XRayProfilingDeepCallStack/4/real_time/threads:2          367 ns        734 ns    2276550                                          
  BM_XRayProfilingDeepCallStack/4/real_time/threads:4          249 ns        994 ns    3981604                   
  BM_XRayProfilingDeepCallStack/4/real_time/threads:8          174 ns       1394 ns    5467368
  BM_XRayProfilingDeepCallStack/4/real_time/threads:16         129 ns       2057 ns    4399568
  BM_XRayProfilingDeepCallStack/4/real_time/threads:32         148 ns       4718 ns    4695104
  BM_XRayProfilingDeepCallStack/8/real_time/threads:1          873 ns        873 ns     788744
  BM_XRayProfilingDeepCallStack/8/real_time/threads:2          535 ns       1071 ns    1177912
  BM_XRayProfilingDeepCallStack/8/real_time/threads:4          339 ns       1354 ns    2235540
  BM_XRayProfilingDeepCallStack/8/real_time/threads:8          256 ns       2051 ns    3818424                             
  BM_XRayProfilingDeepCallStack/8/real_time/threads:16         208 ns       3323 ns    4687040
  BM_XRayProfilingDeepCallStack/8/real_time/threads:32         211 ns       6751 ns    3579136
  BM_XRayProfilingDeepCallStack/16/real_time/threads:1        1652 ns       1652 ns     414737
  BM_XRayProfilingDeepCallStack/16/real_time/threads:2         975 ns       1950 ns     785698
  BM_XRayProfilingDeepCallStack/16/real_time/threads:4         601 ns       2402 ns    1400136
  BM_XRayProfilingDeepCallStack/16/real_time/threads:8         365 ns       2918 ns    2308440
  BM_XRayProfilingDeepCallStack/16/real_time/threads:16        313 ns       5003 ns    1600000
  BM_XRayProfilingDeepCallStack/16/real_time/threads:32        256 ns       8177 ns    3033056
  BM_XRayProfilingDeepCallStack/32/real_time/threads:1        3419 ns       3418 ns     209959
  BM_XRayProfilingDeepCallStack/32/real_time/threads:2        1858 ns       3716 ns     405304
  BM_XRayProfilingDeepCallStack/32/real_time/threads:4        1051 ns       4204 ns     690604
  BM_XRayProfilingDeepCallStack/32/real_time/threads:8         611 ns       4890 ns    1233168
  BM_XRayProfilingDeepCallStack/32/real_time/threads:16        425 ns       6798 ns    1634992
  BM_XRayProfilingDeepCallStack/32/real_time/threads:32        336 ns      10737 ns    1958368
  BM_XRayProfilingDeepCallStack/64/real_time/threads:1        6438 ns       6438 ns     105337                    
  BM_XRayProfilingDeepCallStack/64/real_time/threads:2        3432 ns       6864 ns     197488
  BM_XRayProfilingDeepCallStack/64/real_time/threads:4        2477 ns       9906 ns     376460
  BM_XRayProfilingDeepCallStack/64/real_time/threads:8        1069 ns       8547 ns     578224
  BM_XRayProfilingDeepCallStack/64/real_time/threads:16        684 ns      10949 ns    1079040
  BM_XRayProfilingDeepCallStack/64/real_time/threads:32        482 ns      15417 ns    1298176

One thing to notice here is the non-linear (super-linear?) scaling on the number of iterations and overheads. This is good evidence to support the changes in https://reviews.llvm.org/D49217, and allows us to be more confident in qualifying the costs/overheads of the profiling mode implementation.


Repository:
  rT test-suite

https://reviews.llvm.org/D48879





More information about the llvm-commits mailing list