[PATCH] D63104: Add GSYM utility files along with unit tests.

Mon Jun 17 10:56:57 PDT 2019

clayborg marked 7 inline comments as done.
clayborg added a comment.

Marking inline comments as done and mentioning that I did add tests for FunctionInfo operator < as well for cases where things have inline info and line entries

================
Comment at: include/llvm/DebugInfo/GSYM/FunctionInfo.h:87
+/// when creating the final GSYM file. 
+inline bool operator<(const FunctionInfo &LHS, const FunctionInfo &RHS) {
+  // First sort by address range
----------------
jakehehrlich wrote:
> nit: Most of the branches here are not tested by my eyes.
I'll add some tests.

================
Comment at: include/llvm/DebugInfo/GSYM/FunctionInfo.h:97-98
+  // If the number of lines is the same, then compare line table entries
+  if (LHS.Lines.size() == RHS.Lines.size())
+    return LHS.Lines < RHS.Lines;
+  // Then sort by number of line table entries (more is better)
----------------
The main goal is to be able to sort FuncrtionInfo objects after we fill them out from a variety of places (symtab, DWARF, breakpad) and quickly pick out the one with the best info. We saw many issues with debug info when writing the parser where address ranges would overlap or differ for the same range. So we tried to pick the best info. We can modify this sorting as needed.

================
Comment at: include/llvm/DebugInfo/GSYM/InlineInfo.h:41
+  InlineInfo() : Name(0), CallFile(0), CallLine(0) {}
+  void clear() {
+    Name = 0;
----------------
aprantl wrote:
> It would be so much easier to reason about correctness if these structs were immutable. Doe we really need a clear() method?
These structs are not encoded in memory like this, they are very efficiently encoded and need to be decoded from a stream of bytes. These items are parsed from a stream, and we either parse everything (if you have a symbolication process that will stay around), or just what we need for a given address (for use in a tool like atos). Not sure if that says anything about wether these structs should be immutable though. Usually during lookups you might call clear on a FunctionInfo's Inline argument in case you lookup multiple addresses into the same FunctionInfo. That being said, nothing is stopping us from changing the APIs that we use for parsing to always return a new instance, or a llvm::Optional.

================
Comment at: include/llvm/DebugInfo/GSYM/Range.h:54
+}
+inline bool operator<(const AddressRange &LHS, const AddressRange &RHS) {
+  if (LHS.Start == RHS.Start)
----------------
jakehehrlich wrote:
> If looking up the greatest lower bound on an address in an array sorted by this predicate, the largest range possible would be chosen. It seems to me that we would actually prefer the *smallest* range be chosen because its information would be more specific. That would suggest that instead of  `LHS.End < RHS.End` we should actually use `LHS.End > RHS.End`
I would expect an address with the same start address to still compare the end address:
```
[0x100-0x101) < [0x100-0x102)
```
with lower bound you always have to know to back up by one in case the the addresses are equal.

================
Comment at: include/llvm/DebugInfo/GSYM/Range.h:68
+
+typedef std::vector<AddressRange> AddressRanges;
+bool contains(const AddressRanges &Ranges, uint64_t Addr);
----------------
jakehehrlich wrote:
> prefer using
llvm/include/llvm/DebugInfo/GSYM/Range.h:71:12: error: using declaration cannot refer to a template specialization
using std::vector<AddressRange> AddressRanges;
           ^     ~~~~~~~~~~~~~~

================
Comment at: include/llvm/DebugInfo/GSYM/StringTable.h:49
+  }
+  llvm::Optional<uint32_t> find(StringRef Str) const {
+    // Return the first byte of the GSYM string table which contains the 
----------------
jakehehrlich wrote:
> What is this needed for? I'm not sure I'm happy with this sort of linear search being needed.
I can take this out. This class is used for the mmap'ed string table, not for creating the string table, so I will remove this.

================
Comment at: lib/DebugInfo/GSYM/Range.cpp:22
+void llvm::gsym::insert(AddressRanges &Ranges, const AddressRange &Range) {
+  Ranges.insert(std::upper_bound(Ranges.begin(), Ranges.end(), Range), Range);
+}
----------------
jakehehrlich wrote:
> Do we expect the number of ranges to be small?
Yes, usually not usually a ton of these.

================
Comment at: lib/DebugInfo/GSYM/Range.cpp:31
+  if (Addr >= Ranges.back().End)
+    return false;
+  auto begin = Ranges.begin();
----------------
Will remove. Need empty check though.

================
Comment at: lib/DebugInfo/GSYM/Range.cpp:34
+  auto EndPos = Ranges.end();
+  auto Pos = std::upper_bound(begin, EndPos, Addr);
+  if (Pos == EndPos)
----------------
jakehehrlich wrote:
> In general I think you want a greatest lower bound which is what you're doing but if you keep your current '<' operator and ignore my comment above about swaping the order of 'End' you can use `std::lower_bound` here and avoid the subtraction case.
I tried that with:

```
bool llvm::gsym::contains(const AddressRanges &Ranges, uint64_t Addr) {
  if (Ranges.empty())
    return false;
  auto Begin = Ranges.begin();
  auto EndPos = Ranges.end();
  auto Pos = std::lower_bound(Begin, EndPos, Addr);
  if (Pos != EndPos)
    return Pos->contains(Addr);
  return false;
}
```

But this failed in the following cases:
```
  EXPECT_TRUE(contains(Ranges, 0x2000-1));
  EXPECT_TRUE(contains(Ranges, 0x3000-1));
  EXPECT_TRUE(contains(Ranges, 0x5000-1));
```

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63104/new/

https://reviews.llvm.org/D63104