[PATCH] D36627: dwarfdump: Add an option to collect debug info quality metrics

Wed Aug 16 12:04:02 PDT 2017

aprantl marked 4 inline comments as done.
aprantl added inline comments.

================
Comment at: lib/DebugInfo/DWARF/DWARFDebugLoc.cpp:77-78
+DWARFDebugLoc::getLocationList(unsigned Offset) const {
+  DWARFDebugLoc::LocationList Searched;
+  Searched.Offset = Offset;
+  const LocationList *List =
----------------
dblaikie wrote:
> I believe C++11 lower_bound supports heterogenous lookup, so you don't need to create a LocationList to query - you can write an asymmetric comparison (unsigned < LocationList) & use that.
> 
> I believe we use that to find a CU at an offset - the function's in DWARFContext, I think (just search for lower_bound in lib/DebugInfo/DWARF and you should find this example easily enough, probably)
I added an operator< which is not quite what you suggested, but what all the other DWARF lookups were doing.

================
Comment at: tools/llvm-dwarfdump/Statistics.cpp:64
+  auto &Stats = Statistics[Prefix];
+  Stats.VarsInFunction.insert({Name, 0});
+  if (BytesInScope) {
----------------
dblaikie wrote:
> Name wouldn't be unambiguous - due to shadowing?
Yes, this doesn't handle 
```
void f() {
  { int i = 1; }
  { int i = 2; }
}
```
correctly right now, and I'm also not quite sure how to fix this right now. We could give all lexical scopes unique names based on their nesting levels, but if there are two inlined instances of the same function:

```
void f(bool b) {
  if (b) {
    int i = 1;
  } else {
    int i = 2;
  }
}
```

one inlined as `f(true)`, and one as `f(false)` with only one lexical block remaining in each inlined instance, I don't know how to handle that.

================
Comment at: tools/llvm-dwarfdump/Statistics.cpp:67
+    Stats.TotalVarWithLoc += (unsigned)HasLoc;
+    // Turns out we have a lot of ranges that extend past the lexical scope.
+    Stats.BytesCovered += std::min<uint64_t>(BytesInScope, BytesCovered);
----------------
dblaikie wrote:
> Should we gather statistics on that? (honestly there might be cases where it's more efficient to describe a variable as being in one location with one address range - even though the scope has a hole in it, for example (eg: scope says [a, b)+[c,d)+[e,f) but the variable location says '[a,b) it's in register foo, [c,f) it's in register bar' - I /think/ that's probably OK to do, the debugger/consumer shouldn't consult the variable if it's looking at an instruction in [d,e), right?)
Yeah, it's not something I would collect always, but as a one-off it would be interesting. I recently fixed a bug related to this in r305599 and there's probably more opportunities.

================
Comment at: tools/llvm-dwarfdump/Statistics.cpp:196
+  llvm::errs() << "PC Ranges covered: "
+               << (ScopeBytesCovered * 100) / ScopeBytesTotal << "%\n";
+  return;
----------------
dblaikie wrote:
> This might not be the best statistic.
> 
> Consider code like this
> 
>   void f() {
>     g();
>     h j = k();
>     return j;
>   }
> 
> optimized, 'j' should never have a location range that covers the entire scope of 'f'. There's no value for 'j' during 'g()'.
> 
> So maybe it'd be necessary to track which bytes of a scope are covered by a location list, after the earliest byte in the scope that is covered.
> 
> Separately you could track this statistic (which bytes of a scope are covered by the variable) - but not expect this statistic to approach 100%.
This case even appears in my test case. The point of this number was to look at the delta between two compilers, and not for the ratio to ever approach 100%.
I really like the idea of only starting at the first byte of coverage though, so changed the implementation to collect number instead.

https://reviews.llvm.org/D36627