[PATCH] D113460: [llvm-dwarfdump] Add support for filtering by DIE tags

Wed Nov 10 18:53:18 PST 2021

dblaikie added a comment.

In D113460#3120271 <https://reviews.llvm.org/D113460#3120271>, @woodruffw wrote:

> In D113460#3120137 <https://reviews.llvm.org/D113460#3120137>, @dblaikie wrote:
>
>> I was thinking of something simpler/less invasive (no need to add a new data structure to collect all the DIEs, etc).
>>
>> ie: What if "filterByName" functions were renamed to "filter" and each took an extra argument, `Tags`, & did all the same stufff but also checked Tags?
>
> I can do this, but I want to offer a bit of justification for the new structure first :-)
>
> Maintainability-wise, I think the new approach is slightly easier to extend with additional filters (e.g. a `--type` filter for matching `DW_TAG_type`): they can either be added to `dieFilter` or, with my previous changeset, composed inside a lambda. The control flow is also a little easier to follow and makes use of `make_filter_range`, vs. calling through an overloaded `filterByName` with duplicate callsites for `Die.dump`.
>
> Performance-wise, pre-collecting all of the DIEs is no slower than the previous approach. I've pasted some local benchmarks below; the test file is a debug build of `llvm-dwarfdump`, containing approximately 150MB of debug info. My changes are *slightly* faster, but that's probably noise/because the pre-collection simplifies the instructions needed to fetch each DIE.
>
> Before this changeset:
>
>   Performance counter stats for './bin/llvm-dwarfdump --name Vector --regex /home/william/tmp/llvm-dwarfdump' (100 runs):
>   
>            6,587.02 msec task-clock                #    0.999 CPUs utilized            ( +-  0.06% )
>                 936      context-switches          #    0.142 K/sec                    ( +-  3.60% )
>                  61      cpu-migrations            #    0.009 K/sec                    ( +-  2.65% )
>              89,214      page-faults               #    0.014 M/sec                    ( +-  0.00% )
>      26,887,760,726      cycles                    #    4.082 GHz                      ( +-  0.04% )  (62.46%)
>         475,358,555      stalled-cycles-frontend   #    1.77% frontend cycles idle     ( +-  0.59% )  (62.48%)
>       6,463,934,156      stalled-cycles-backend    #   24.04% backend cycles idle      ( +-  0.16% )  (62.50%)
>      79,870,809,617      instructions              #    2.97  insn per cycle
>                                                    #    0.08  stalled cycles per insn  ( +-  0.01% )  (62.52%)
>      16,098,031,769      branches                  # 2443.901 M/sec                    ( +-  0.01% )  (62.54%)
>          53,414,087      branch-misses             #    0.33% of all branches          ( +-  0.24% )  (62.54%)
>      26,475,961,150      L1-dcache-loads           # 4019.413 M/sec                    ( +-  0.01% )  (62.50%)
>          99,919,546      L1-dcache-load-misses     #    0.38% of all L1-dcache accesses  ( +-  3.81% )  (62.47%)
>     <not supported>      LLC-loads
>     <not supported>      LLC-load-misses
>   
>             6.59507 +- 0.00363 seconds time elapsed  ( +-  0.05% )
>
> and with DIE pre-collection:
>
>   Performance counter stats for './bin/llvm-dwarfdump --name Vector --regex /home/william/tmp/llvm-dwarfdump' (100 runs):
>   
>            6,585.08 msec task-clock                #    0.999 CPUs utilized            ( +-  0.06% )
>                 842      context-switches          #    0.128 K/sec                    ( +-  2.58% )
>                  69      cpu-migrations            #    0.010 K/sec                    ( +-  2.59% )
>             128,934      page-faults               #    0.020 M/sec                    ( +-  0.00% )
>      26,807,844,451      cycles                    #    4.071 GHz                      ( +-  0.05% )  (62.46%)
>         472,800,149      stalled-cycles-frontend   #    1.76% frontend cycles idle     ( +-  0.70% )  (62.49%)
>       4,915,697,574      stalled-cycles-backend    #   18.34% backend cycles idle      ( +-  0.18% )  (62.50%)
>      79,482,489,382      instructions              #    2.96  insn per cycle
>                                                    #    0.06  stalled cycles per insn  ( +-  0.01% )  (62.52%)
>      16,069,809,100      branches                  # 2440.337 M/sec                    ( +-  0.01% )  (62.54%)
>          50,892,766      branch-misses             #    0.32% of all branches          ( +-  0.05% )  (62.53%)
>      26,035,545,689      L1-dcache-loads           # 3953.718 M/sec                    ( +-  0.01% )  (62.49%)
>         110,040,468      L1-dcache-load-misses     #    0.42% of all L1-dcache accesses  ( +-  3.55% )  (62.46%)
>     <not supported>      LLC-loads
>     <not supported>      LLC-load-misses
>   
>             6.59287 +- 0.00390 seconds time elapsed  ( +-  0.06% )
>
> Overall, I think the absence of a performance regression and easier extensibility make the more invasive changes worth it. But I'm happy to roll back to a version that uses the nested `filter` functions instead, if you think it's not worth the churn.

Generally it's best to separate refactoring from semantic changes - either before or after the semantic change. I'm not sure this code is worth a separate change to improve its extensibility.

But if I were going to go there, I think I'd end up with something more like this:

  dumpFiltered(dieFilter, DICtx.normal_units(), OS);
  dumpFiltered(dieFilter, DICtx.dwo_units(), OS);

================
Comment at: llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp:459-465
+  if (const char *Name = Die.getName(DINameKind::ShortName))
+    if (filterByName(Names, Name))
+      return true;
+  if (const char *Name = Die.getName(DINameKind::LinkageName))
+    if (filterByName(Names, Name))
+      return true;
+  return Tags.count(Die.getTag());
----------------
Looks like this has different behavior from the original - this'll print if the name or the tag is satisfied, the original printed only if the name and the tag were satisfied, I think? From the patch description, it sounds like the original was the desired behavior, and this new behavior is unintended.

(speaking of: this needs test coverage)

================
Comment at: llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp:487
+    SmallSet<unsigned, 8> Tags;
+    for (auto tag : Tag)
+      Tags.insert(dwarf::getTag(StringRef("DW_TAG_" + StringRef(tag).lower())));
----------------
Guessing this should be const ref, rather than value - I think the value is std::string, so by value will be copying the strings unnecessarily? (might be worth actually naming std::string here to avoid confusion, even)

================
Comment at: llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp:495
+    auto Filter = [&](DWARFDie Die) { return dieFilter(Names, Tags, Die); };
+    for (const auto Die : make_filter_range(Dies, Filter))
+      Die.dump(OS, 0, getDumpOpts(DICtx));
----------------
Either `const auto&` or non-const `auto` here? (or explicitly `DWARFDie`) - we don't usually use top-level const on local values in the LLVM codebase.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113460/new/

https://reviews.llvm.org/D113460