[PATCH] D69027: [llvm-dwarfdump][Statistics] Fix calculation of OffsetToFirstDefinition

Tue Oct 29 16:30:46 PDT 2019

avl added a comment.

> I'm not sure this answers the question which was about your statement: "The end of the variable's address ranges should also be encountered."

> & I'm not sure those are two ideal concepts.

> In C++ & other scope-based languages, the scope of a variable is from the point of declaration to the end of the enclosing scope. In the DWARF the first point there is lost - so we make a heuristic guess at it.

> The region from definition till last use would also be a guess - the DWARF doesn't have information about the last use. Only about the last live location (which may be past the last use).

right.

> We know that the variable should be live until the end of the enclosing scope (as per the language) but we don't know where it should start - so there are (if I'm unedrstanding this discussion correctyl) two statistics - one for each end of the heuristic range: one measuring from the start of the enclosing scope (the earliest the variable could be declared) and the other measuring from the first reported location (that's the statistic that's being discussed/fixed here).

At the present moment first one(the earliest the variable could be declared) is not calculated(it was calculated partially in original code). My main concern is that we need this.

> I don't think pruning the end of the latter list improves the quality of the statistic - since scope based languages, as @krisb said, have the variable accessible (at the language level) to the end of the enclosing scope, so we know that coverage bytes missing in that tail are missed opportunities for greater variable coverage.

There are three views on coverage discussed here :

1. range from the start of scope(the earliest the variable could be declared position) till the end of the scope.

  This shows how debug info cover scope from the source code where the variable is declared. I think we need this view since it allows for tracking all changes with coverage, and is based on known scope boundaries.

2. range from the variable declaration point(first reported location) till the end of the scope.

  This shows how debug info covers variable enclosing scope from the source code. It could miss some debug info changes since we use a heuristic to decide where the variable declaration is located.

3. range for a variable lifetime - from first reported location till last reported location.

  This assumed to show how debug info covers the resulting code. i.e., whether all resulting binary instructions covered by debug info. This view could be useful, especially for optimized code. Let`s see the following example.

  int foo ( int p1 ) {
      int ll = p1 + 5;

      printf("\n ll %d", ll );

      .. other code
  }

Variable "ll" declared at the very start of the scope. It is visible until the end of the scope. But when compiled with optimization, then that variable would be allocated into the register. As a result, it would be visible during a smaller range. Debug info covers all that range:

  0x0000002a:   DW_TAG_subprogram
                  DW_AT_low_pc    (0x0000000000000000)
                  DW_AT_high_pc   (0x0000000000000027)

  0x00000056:     DW_TAG_variable
                    DW_AT_location        (0x00000036
                       [0x0000000000000006,  0x0000000000000012): DW_OP_reg4 RSI)

  0x00000074:     NULL

If we would calculate coverage using #2, then the coverage would be 36%. 
But all instructions which relate to the variable "ll" covered by debug info. 
So, in this case, debug info covers 100% existing code.
Thus #3 would show real coverage in this case. That's the idea.
But the problem here is that the start and end of the variable range reported in DWARF
could not match with real instructions. So #3 is also some heuristic.

Shortly, I think we need #1, and probably #2, #3.

> That's why there's the other statistic, which is bytes of enclosing scope coverage rather than the version that trims the start. ("vars scope bytes covered" / "vars scope bytes total")

My understanding is that there is no statistic which does not trim the start:

  BytesInScope -= OffsetToFirstDefinition;
  // Turns out we have a lot of ranges that extend past the lexical scope.
  GlobalStats.ScopeBytesCovered += std::min(BytesInScope, BytesCovered);
  GlobalStats.ScopeBytesFromFirstDefinition += BytesInScope;
  GlobalStats.VarScopeBytesCovered += std::min(BytesInScope, BytesCovered);
  GlobalStats.VarScopeBytesFromFirstDefinition += BytesInScope;  

collectLocStats() from the original version receives not trimmed scope, but it looks like a error.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69027/new/

https://reviews.llvm.org/D69027