[PATCH] D113460: [llvm-dwarfdump] Add support for filtering by DIE tags

Tue Nov 9 18:09:34 PST 2021

woodruffw added a comment.

In D113460#3120137 <https://reviews.llvm.org/D113460#3120137>, @dblaikie wrote:

> I was thinking of something simpler/less invasive (no need to add a new data structure to collect all the DIEs, etc).
>
> ie: What if "filterByName" functions were renamed to "filter" and each took an extra argument, `Tags`, & did all the same stufff but also checked Tags?

I can do this, but I want to offer a bit of justification for the new structure first :-)

Maintainability-wise, I think the new approach is slightly easier to extend with additional filters (e.g. a `--type` filter for matching `DW_TAG_type`): they can either be added to `dieFilter` or, with my previous changeset, composed inside a lambda. The control flow is also a little easier to follow and makes use of `make_filter_range`, vs. calling through an overloaded `filterByName` with duplicate callsites for `Die.dump`.

Performance-wise, pre-collecting all of the DIEs is no slower than the previous approach. I've pasted some local benchmarks below; the test file is a debug build of `llvm-dwarfdump`, containing approximately 150MB of debug info. My changes are *slightly* faster, but that's probably noise/because the pre-collection simplifies the instructions needed to fetch each DIE.

Before this changeset:

  Performance counter stats for './bin/llvm-dwarfdump --name Vector --regex /home/william/tmp/llvm-dwarfdump' (100 runs):

           6,587.02 msec task-clock                #    0.999 CPUs utilized            ( +-  0.06% )
                936      context-switches          #    0.142 K/sec                    ( +-  3.60% )
                 61      cpu-migrations            #    0.009 K/sec                    ( +-  2.65% )
             89,214      page-faults               #    0.014 M/sec                    ( +-  0.00% )
     26,887,760,726      cycles                    #    4.082 GHz                      ( +-  0.04% )  (62.46%)
        475,358,555      stalled-cycles-frontend   #    1.77% frontend cycles idle     ( +-  0.59% )  (62.48%)
      6,463,934,156      stalled-cycles-backend    #   24.04% backend cycles idle      ( +-  0.16% )  (62.50%)
     79,870,809,617      instructions              #    2.97  insn per cycle
                                                   #    0.08  stalled cycles per insn  ( +-  0.01% )  (62.52%)
     16,098,031,769      branches                  # 2443.901 M/sec                    ( +-  0.01% )  (62.54%)
         53,414,087      branch-misses             #    0.33% of all branches          ( +-  0.24% )  (62.54%)
     26,475,961,150      L1-dcache-loads           # 4019.413 M/sec                    ( +-  0.01% )  (62.50%)
         99,919,546      L1-dcache-load-misses     #    0.38% of all L1-dcache accesses  ( +-  3.81% )  (62.47%)
    <not supported>      LLC-loads
    <not supported>      LLC-load-misses

            6.59507 +- 0.00363 seconds time elapsed  ( +-  0.05% )

and with DIE pre-collection:

  Performance counter stats for './bin/llvm-dwarfdump --name Vector --regex /home/william/tmp/llvm-dwarfdump' (100 runs):

           6,585.08 msec task-clock                #    0.999 CPUs utilized            ( +-  0.06% )
                842      context-switches          #    0.128 K/sec                    ( +-  2.58% )
                 69      cpu-migrations            #    0.010 K/sec                    ( +-  2.59% )
            128,934      page-faults               #    0.020 M/sec                    ( +-  0.00% )
     26,807,844,451      cycles                    #    4.071 GHz                      ( +-  0.05% )  (62.46%)
        472,800,149      stalled-cycles-frontend   #    1.76% frontend cycles idle     ( +-  0.70% )  (62.49%)
      4,915,697,574      stalled-cycles-backend    #   18.34% backend cycles idle      ( +-  0.18% )  (62.50%)
     79,482,489,382      instructions              #    2.96  insn per cycle
                                                   #    0.06  stalled cycles per insn  ( +-  0.01% )  (62.52%)
     16,069,809,100      branches                  # 2440.337 M/sec                    ( +-  0.01% )  (62.54%)
         50,892,766      branch-misses             #    0.32% of all branches          ( +-  0.05% )  (62.53%)
     26,035,545,689      L1-dcache-loads           # 3953.718 M/sec                    ( +-  0.01% )  (62.49%)
        110,040,468      L1-dcache-load-misses     #    0.42% of all L1-dcache accesses  ( +-  3.55% )  (62.46%)
    <not supported>      LLC-loads
    <not supported>      LLC-load-misses

            6.59287 +- 0.00390 seconds time elapsed  ( +-  0.06% )

Overall, I think the absence of a performance regression and easier extensibility make the more invasive changes worth it. But I'm happy to roll back to a version that uses the nested `filter` functions instead, if you think it's not worth the churn.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113460/new/

https://reviews.llvm.org/D113460