[PATCH] D36627: dwarfdump: Add an option to collect debug info quality metrics

Sat Aug 12 23:13:01 PDT 2017

dblaikie added inline comments.

================
Comment at: lib/DebugInfo/DWARF/DWARFDebugLoc.cpp:77-78
+DWARFDebugLoc::getLocationList(unsigned Offset) const {
+  DWARFDebugLoc::LocationList Searched;
+  Searched.Offset = Offset;
+  const LocationList *List =
----------------
I believe C++11 lower_bound supports heterogenous lookup, so you don't need to create a LocationList to query - you can write an asymmetric comparison (unsigned < LocationList) & use that.

I believe we use that to find a CU at an offset - the function's in DWARFContext, I think (just search for lower_bound in lib/DebugInfo/DWARF and you should find this example easily enough, probably)

================
Comment at: lib/DebugInfo/DWARF/DWARFDebugLoc.cpp:80
+  const LocationList *List =
+      std::lower_bound(Locations.begin(), Locations.end(), Searched,
+                       [&](DWARFDebugLoc::LocationList A,
----------------
Might be worth adding a range-based lower_bound to STLExtras.h? I think there are a few uses of std::lower_bound that could be switched to that.

================
Comment at: lib/DebugInfo/DWARF/DWARFDebugLoc.cpp:82
+                       [&](DWARFDebugLoc::LocationList A,
+                           DWARFDebugLoc::LocationList B) -> bool {
+                         return A.Offset < B.Offset;
----------------
No need for the "-> bool" here, if you want to skip it.

================
Comment at: tools/llvm-dwarfdump/Statistics.cpp:64
+  auto &Stats = Statistics[Prefix];
+  Stats.VarsInFunction.insert({Name, 0});
+  if (BytesInScope) {
----------------
Name wouldn't be unambiguous - due to shadowing?

================
Comment at: tools/llvm-dwarfdump/Statistics.cpp:67
+    Stats.TotalVarWithLoc += (unsigned)HasLoc;
+    // Turns out we have a lot of ranges that extend past the lexical scope.
+    Stats.BytesCovered += std::min<uint64_t>(BytesInScope, BytesCovered);
----------------
Should we gather statistics on that? (honestly there might be cases where it's more efficient to describe a variable as being in one location with one address range - even though the scope has a hole in it, for example (eg: scope says [a, b)+[c,d)+[e,f) but the variable location says '[a,b) it's in register foo, [c,f) it's in register bar' - I /think/ that's probably OK to do, the debugger/consumer shouldn't consult the variable if it's looking at an instruction in [d,e), right?)

================
Comment at: tools/llvm-dwarfdump/Statistics.cpp:178
+
+  // Print summary: Machine-readable to stdout, human-readable to stderr.
+  llvm::outs().SetBufferSize(1024);
----------------
Worth having two formats?

It seems especially strange/uncommon to have both outputs generated in a single run and need to filter them out with different pipes - is there any precedent for that in other LLVM tools?

I'd say if, as the comment above suggests, the stats are only really interesting in comparison with stats from another run, then maybe only the machine-readable form is relevant?

================
Comment at: tools/llvm-dwarfdump/Statistics.cpp:193-194
+  llvm::outs() << "}\n";
+  llvm::errs() << "Total Availability: " << (VarWithLoc * 100) / VarTotal
+               << "%\n";
+  llvm::errs() << "PC Ranges covered: "
----------------
that'll get rounded down, if I recall correctly - is that desirable (eg: 49.999% will render as 49% not 50%?)

================
Comment at: tools/llvm-dwarfdump/Statistics.cpp:196
+  llvm::errs() << "PC Ranges covered: "
+               << (ScopeBytesCovered * 100) / ScopeBytesTotal << "%\n";
+  return;
----------------
This might not be the best statistic.

Consider code like this

  void f() {
    g();
    h j = k();
    return j;
  }

optimized, 'j' should never have a location range that covers the entire scope of 'f'. There's no value for 'j' during 'g()'.

So maybe it'd be necessary to track which bytes of a scope are covered by a location list, after the earliest byte in the scope that is covered.

Separately you could track this statistic (which bytes of a scope are covered by the variable) - but not expect this statistic to approach 100%.

https://reviews.llvm.org/D36627