[PATCH] D63713: WIP: DataExtractor error handling

Tue Jul 2 06:42:07 PDT 2019

labath added a comment.

Ok, I got some numbers now. I went for a micro-benchmark-like setup to show some kind of a worst case scenario. The test setup is as follows:

I took the largest single .o file I had around (Registry.cpp.o in clangDynamicASTMatchers library). I then linked it to remove any relocations (otherwise, most of the time is spend applying those). Then I modified llvm-dwarfdump to parse the debug_loc section without dumping anything (to avoid measuring the time spend in printfs). Both llvm-dwarfdump and Registry.cpp.o I was dumping were built with -O3 -g, with asserts disabled (no FDO, LTO or other fancy stuff). This resulted in about 4.5 megabytes of debug_loc for parsing in Registry.cpp.o. Then I used the linux perf command to run llvm-dwarfdump -debug-loc 1000 times and dump the stats.

The baseline stats are:

    27.285986      task-clock:u (msec)       #    0.986 CPUs utilized            ( +-  0.11% )
            0      context-switches:u        #    0.000 K/sec                  
            0      cpu-migrations:u          #    0.000 K/sec                  
        2,813      page-faults:u             #    0.103 M/sec                    ( +-  0.24% )
   58,831,163      cycles:u                  #    2.156 GHz                      ( +-  0.07% )
      606,986      stalled-cycles-frontend:u #    1.03% frontend cycles idle     ( +-  3.76% )
    7,924,778      stalled-cycles-backend:u  #   13.47% backend cycles idle      ( +-  0.33% )
  146,588,727      instructions:u            #    2.49  insn per cycle         
                                             #    0.05  stalled cycles per insn  ( +-  0.00% )
   29,545,620      branches:u                # 1082.813 M/sec                    ( +-  0.00% )
      222,276      branch-misses:u           #    0.75% of all branches          ( +-  0.15% )

  0.027663381 seconds time elapsed                                          ( +-  0.11% )

The stats with this patch applied are:

    27.397390      task-clock:u (msec)       #    0.987 CPUs utilized            ( +-  0.10% )
            0      context-switches:u        #    0.000 K/sec                  
            0      cpu-migrations:u          #    0.000 K/sec                  
        2,833      page-faults:u             #    0.103 M/sec                    ( +-  0.24% )
   60,160,571      cycles:u                  #    2.196 GHz                      ( +-  0.07% )
      584,825      stalled-cycles-frontend:u #    0.97% frontend cycles idle     ( +-  3.37% )
   10,729,974      stalled-cycles-backend:u  #   17.84% backend cycles idle      ( +-  0.26% )
  156,141,836      instructions:u            #    2.60  insn per cycle         
                                             #    0.07  stalled cycles per insn  ( +-  0.00% )
   31,599,940      branches:u                # 1153.392 M/sec                    ( +-  0.00% )
      221,247      branch-misses:u           #    0.70% of all branches          ( +-  0.06% )

  0.027771865 seconds time elapsed                                          ( +-  0.10% )

The stats for a version of this patch which additionally checks for the error flag before attempting the parse (as discussed in the inline comment) are:

    27.808349      task-clock:u (msec)       #    0.986 CPUs utilized            ( +-  0.10% )
            0      context-switches:u        #    0.000 K/sec                  
            0      cpu-migrations:u          #    0.000 K/sec                  
        2,839      page-faults:u             #    0.102 M/sec                    ( +-  0.24% )
   62,887,388      cycles:u                  #    2.261 GHz                      ( +-  0.06% )
      575,264      stalled-cycles-frontend:u #    0.91% frontend cycles idle     ( +-  3.18% )
   14,757,888      stalled-cycles-backend:u  #   23.47% backend cycles idle      ( +-  0.23% )
  167,562,307      instructions:u            #    2.66  insn per cycle         
                                             #    0.09  stalled cycles per insn  ( +-  0.00% )
   33,414,152      branches:u                # 1201.587 M/sec                    ( +-  0.00% )
      221,454      branch-misses:u           #    0.66% of all branches          ( +-  0.12% )

  0.028201319 seconds time elapsed                                          ( +-  0.10% )

As can be seen, this patch increases the parsing time by about 0.4%. This is enough to be statistically significant in a benchmark like this, but probably not world-shattering (some slowdown is unavoidable with a change like this).

If we additionally enable the early returns we get an additional 1.5% slowdown (or 1.9% above baseline). Still not bad for a "micro-benchmark", but it does make one wonder, whether it is really worth it. My feeling would be that it isn't...

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63713/new/

https://reviews.llvm.org/D63713