[PATCH] D51081: [DWARF v5] Refactoring range list dumping to fold DWARF v4 functionality into v5 handling (almost NFC).

Thu Aug 23 17:46:29 PDT 2018

wolfgangp added inline comments.

================
Comment at: include/llvm/DebugInfo/DWARF/DWARFDebugRnglists.h:67
+  getAbsoluteRanges(llvm::Optional<BaseAddress> BaseAddr,
+                    uint16_t Version) const;
 };
----------------
vleschuk wrote:
> I suggest that DWARFListType::extract() saves Version in class data, and thus we don't need to specify it here explicitly. Same suggestion regarding RangeListEntry and other classes.
I was trying to avoid having to store the version in each list and entry, since it's always available from the context (i.e. the table during dumping and extracting). Based on David's suggestion I eliminated it from this function. It would remove a parameter from the extract and dump routines, however. I'm not sure. I guess I can see it stored in the list, but perhaps not in each individual entry.

================
Comment at: include/llvm/DebugInfo/DWARF/DWARFListTable.h:52
+  bool empty() const {
+    return Entries.empty() || Entries.begin()->isEndOfList();
+  }
----------------
probinson wrote:
> dblaikie wrote:
> > Why this new special case - when would the range list contain an empty range?
> It's explicitly legal to have a list that consists solely of the end-of-list marker.  Not saying any producer would actually emit it this way, but it's one way to describe an empty range.  The parser should handle this case smoothly.
Also , the DWARF5 rnglist implementation keeps the end-of-list entry in the list (unlike the DWARF4 implementation), which makes display a bit more straightforward.

================
Comment at: include/llvm/DebugInfo/DWARF/DWARFListTable.h:52
+  bool empty() const {
+    return Entries.empty() || Entries.begin()->isEndOfList();
+  }
----------------
dblaikie wrote:
> wolfgangp wrote:
> > probinson wrote:
> > > dblaikie wrote:
> > > > Why this new special case - when would the range list contain an empty range?
> > > It's explicitly legal to have a list that consists solely of the end-of-list marker.  Not saying any producer would actually emit it this way, but it's one way to describe an empty range.  The parser should handle this case smoothly.
> > Also , the DWARF5 rnglist implementation keeps the end-of-list entry in the list (unlike the DWARF4 implementation), which makes display a bit more straightforward.
> Good point - I guess I could narrow/clarify my comment somewhat: Why is the emptiness of a range list important? I'd figure the printing algorithm would work the same either way.
>Good point - I guess I could narrow/clarify my comment somewhat: Why is the emptiness of a > range list important? I'd figure the printing algorithm would work the same either way.

This was motivated by the code in DwarfLinker.cpp (dsymutil) that needs to check for the emptiness of a list during range patching. With the end-of-list entry as part of the Entries vector it seemed prudent to provide this method.

================
Comment at: include/llvm/DebugInfo/DWARF/DWARFListTable.h:68
+  void dump(raw_ostream &OS, DWARFContext *C, uint8_t AddressSize,
+            uint64_t BaseAddress, unsigned Indent, uint16_t Version,
+            size_t MaxEncodingStringLength = 0,
----------------
vleschuk wrote:
> Why does dump() method need Version argument? We set the version when we are extracting the data. Maybe save it somewhere during extraction?
See my comment above. I'm not sure if I want the version to be stored in each individual entry, given that they are usually processed altogether and not passed around individually anyway. I could be persuaded, though. It would make the interface simpler.

================
Comment at: include/llvm/DebugInfo/DWARF/DWARFListTable.h:71
+            DIDumpOptions DumpOpts = {}) const;
+  void addEntry(const ListEntryType &Entry) { Entries.push_back(Entry); }
+  /// Print a header for the list. Returns the indentation that is to be used
----------------
dblaikie wrote:
> This appears to be unused in this patch? (I see no other mention of 'addEntry') - delete it as dead code?
I will be using this with location lists in an upcoming patch. Ah, I see, it's bad form to add it before it's actually used. OK I'll delete it.

================
Comment at: include/llvm/DebugInfo/DWARF/DWARFListTable.h:72-77
+  /// Print a header for the list. Returns the indentation that is to be used
+  /// when subsequently dumping each list entry.
+  unsigned printListHeader(raw_ostream &OS, uint8_t AddrSize,
+                           DIDumpOptions DIDumpOpts) const {
+    return 0;
+  }
----------------
dblaikie wrote:
> This seems to be dead in this patch - maybe remove it from here and add it in whichever patch adds its use?
Actually, it's used further down in DWARFListTableBase<DWARFListType>::dump

================
Comment at: include/llvm/DebugInfo/DWARF/DWARFListTable.h:87-90
+    /// The DWARF version number. We assume version 5 until we either extract
+    /// the version from the section or explicitly set it (e.g. to fake a
+    /// table for DWARF v4 and earlier).
+    uint16_t Version = 5;
----------------
dblaikie wrote:
> That sounds like in all cases we know the version from the input, right? So when/why would this assumption be needed?
During extraction we're using getTableLength() (I've renamed it from just 'length()' in the first version of this review) for validation and also to determine if we can recover from an error and keep parsing the section. With the creation of an artificial table for DWARF 4 range lists the calculation of the table length is now dependent on the version, since for DWARF 5 we need to add the size of the length field to the length we extracted. In DWARF 4 the length is simply the size of the section as is. 
If we find an error in the table header before we get to extract the version, the version would obviously not be set to anything meaningful and would confuse our recovery in DWARFContext.cpp. Also, presetting the version allows us to use getTableLength() before we actually extract the version for validation.
But it's perhaps more appropriate to preset the version during extraction, so I'm doing it there now.

================
Comment at: include/llvm/DebugInfo/DWARF/DWARFListTable.h:240
+  uint16_t getVersion() const { return Header.getVersion(); }
+  uint32_t length() const { return Header.length(); }
 };
----------------
vleschuk wrote:
> To make naming consistent maybe we should rename it to getLength()?
I've renamed the DWARFListTableBase::length() to  getLength() and
DWARFListTableHeader::length() to getTableLength().
DWARFListTableHeader::getLength() simply extracts the length field from the header data.

================
Comment at: lib/DebugInfo/DWARF/DWARFDebugRnglists.cpp:30-33
+  // In DWARF v4 and earlier, a range list entry corresponds to DWARF v5
+  // entry kind DW_RLE_start_end, except that it is possibly subject to
+  // base adjustment, which is not relevant during extraction.
+  uint8_t Encoding = dwarf::DW_RLE_start_end;
----------------
dblaikie wrote:
> Would it be simpler to model DWARFv4 range list entries as RLE_offset_pair, rather than as start_end with an offset? It looks like then it'd fall out without a special case in getAbsoluteRanges (& that function may not need to be touched at all - it wouldn't need to change behavior based on version & maybe the assertions aren't worth the extra API change?)
I tried this and it seems it would leave getAbsoluteRanges unchanged. The only negative IMO is that offset_pair does not expect relocated addresses but only the extractor has to worry about that. 

================
Comment at: lib/DebugInfo/DWARF/DWARFDebugRnglists.cpp:84-92
+      if (Version < 5)
+        return createStringError(
+            errc::invalid_argument,
+            "invalid range list entry at offset 0x%" PRIx32, *OffsetPtr);
       return createStringError(errc::invalid_argument,
                          "insufficient space remaining in table for "
                          "DW_RLE_start_end encoding "
----------------
dblaikie wrote:
> Suspect it might be fine to rephrase the DWARFv5 message ("start/end encoding" (or "offset pair at offset ..." ) - so it applies to the 4 and 5 case equally) to be more general & avoid the 4/5 special casing here. (or in offset_pair if the code moves there based on the previous comment)
The error handling in offset_pair is different b/c DWARF 4 range lists take relocated addresses, whereas DWARF 5 offset_pair takes ULEBs. We have to validate after extracting in the DWARF 5 case. This makes the code for the 2 versions fairly different anyway. Is is still worth reconciling the error messages?

================
Comment at: lib/DebugInfo/DWARF/DWARFUnit.cpp:318-319
+  // the table header which is expected to start at offset 0.
+  if (isDWO)
+    Base = Table.getHeaderSize();
+  return Table;
----------------
dblaikie wrote:
> Maybe rather than propagating DWO information here, the Base could be set to Table.getHeaderSize() if there is no Base specified? (using Base == 0 or Base == -1 or something as a flagging value to communicate "no Base has been specified, fill one in")?
> 
> (though side rant: I don't understand why these offsets ever point past the header - that seems like it breaks the purpose of the header to provide section versioning, etc - it means the header format can never change and the parser would have to walk backwards from the base offset to parse the version to know how to parse the rest of the section anyway... @probinson @aprantl - though perhaps I've ranted sufficiently about this before)
We're going to need DWO information later anyway during extraction of the (range or location) lists to decide which entries are valid, e.g. start_end, start_length, base_address are only valid in non-split CUs.

================
Comment at: test/DebugInfo/dwarfdump-ranges.test:18-23
 CHECK-NEXT: 00000000 000000000000062c 0000000000000637
-CHECK-NEXT: 00000000 0000000000000637 000000000000063d
-CHECK-NEXT: 00000000 <End of list>
+CHECK-NEXT: 00000010 0000000000000637 000000000000063d
+CHECK-NEXT: 00000020 <End of list>
 CHECK-NEXT: 00000030 0000000000000640 000000000000064b
-CHECK-NEXT: 00000030 0000000000000637 000000000000063d
-CHECK-NEXT: 00000030 <End of list>
+CHECK-NEXT: 00000040 0000000000000637 000000000000063d
+CHECK-NEXT: 00000050 <End of list>
----------------
dblaikie wrote:
> If you're updating all these test cases anyway - is it worth revisiting what the best format for dumping debug_ranges is? I'm not sure printing those offsets is especially meaningful (either in the old behavior or the new behavior) rather than just printing the starting offset, then printing the pairs.
> 
> & printing it in a way that's more similar between ranges and rnglistsn wouldn't hurt either - not sure if there's something in common that'd suit everyone though.
If we're not too worried about anybody depending on the existing format, we could make the same distinction in the old mode between verbose and terse mode: 

Verbose mode: section offsets + raw values of entries + final values of ranges (i.e. adjusted by base) displayed as intervals (between '[' and ')').

Terse mode: just the final ranges displayed as intervals.

The only difference between old and new would be no encoding strings in the old verbose mode.

================
Comment at: test/DebugInfo/dwarfdump-ranges.test:18-23
 CHECK-NEXT: 00000000 000000000000062c 0000000000000637
-CHECK-NEXT: 00000000 0000000000000637 000000000000063d
-CHECK-NEXT: 00000000 <End of list>
+CHECK-NEXT: 00000010 0000000000000637 000000000000063d
+CHECK-NEXT: 00000020 <End of list>
 CHECK-NEXT: 00000030 0000000000000640 000000000000064b
-CHECK-NEXT: 00000030 0000000000000637 000000000000063d
-CHECK-NEXT: 00000030 <End of list>
+CHECK-NEXT: 00000040 0000000000000637 000000000000063d
+CHECK-NEXT: 00000050 <End of list>
----------------
wolfgangp wrote:
> dblaikie wrote:
> > If you're updating all these test cases anyway - is it worth revisiting what the best format for dumping debug_ranges is? I'm not sure printing those offsets is especially meaningful (either in the old behavior or the new behavior) rather than just printing the starting offset, then printing the pairs.
> > 
> > & printing it in a way that's more similar between ranges and rnglistsn wouldn't hurt either - not sure if there's something in common that'd suit everyone though.
> If we're not too worried about anybody depending on the existing format, we could make the same distinction in the old mode between verbose and terse mode: 
> 
> Verbose mode: section offsets + raw values of entries + final values of ranges (i.e. adjusted by base) displayed as intervals (between '[' and ')').
> 
> Terse mode: just the final ranges displayed as intervals.
> 
> The only difference between old and new would be no encoding strings in the old verbose mode.
I changed the formatting to roughly what we do in the new version, so the table looks like this in verbose mode:

  .debug_ranges contents:
  0x00000000: 0x0000000000000000, 0x0000000000000001 => [0x0000000000000000, 0x0000000000000001)
  0x00000010: 0x0000000000000003, 0x0000000000000006 => [0x0000000000000003,  0x0000000000000006)
  0x00000020: 0xffffffffffffffff, 0x0000000000000000
  0x00000030: 0x0000000000000001, 0x0000000000000002 => [0x0000000000000001, 0x0000000000000002)
  0x00000040: 0x0000000000000000, 0x0000000000000000

and like this in terse mode:

  .debug_ranges contents:
  [0x0000000000000000, 0x0000000000000001)
  [0x0000000000000003, 0x0000000000000006)
  [0x0000000000000001, 0x0000000000000002)
  <End of list>

Does this make sense?

https://reviews.llvm.org/D51081