[PATCH] D83050: [DebugInfo] Add more checks to parsing .debug_pub* sections.

Fri Jul 3 08:34:28 PDT 2020

ikudrin added inline comments.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp:27-28
   Sets.clear();
-  DataExtractor::Cursor C(0);
-  while (C && Data.isValidOffset(C.tell())) {
+  uint64_t Offset = 0;
+  while (Data.isValidOffset(Offset)) {
+    uint64_t SetOffset = Offset;
----------------
jhenderson wrote:
> What's behind the reasoning for no longer using the `Cursor` throughout?
The method now reports all encountered errors through `RecoverableErrorHandler` and does not return `Error`. The `Cursor` requires its error state to be checked in any case. While the former code could simply return the error state, now this checking is a bit inconvenient, and, moreover, useless.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp:31
     Sets.push_back({});
-    Set &SetData = Sets.back();
+    auto &Set = Sets.back();

----------------
jhenderson wrote:
> FWIW, I'd prefer this to remain not `auto`, as it isn't clear to me what the type of `Set` is from the immediate context.
OK. I'll rename the variable to `NewSet` then.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp:46
+    Offset += Set.Length;
+    DWARFDataExtractor SetData(Data, Offset);
+    const unsigned OffsetSize = dwarf::getDwarfOffsetByteSize(Set.Format);
----------------
jhenderson wrote:
> You probably want to include the expected length of the table in this data extractor too, to stop reading into the next table under any circumstance (e.g. the length would partially truncate the final terminator).
The second parameter, `Offset`, is the limiter. Note that it is just updated to point to the start of the next table which is the same as the end of the current one.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp:57
+          createStringError(errc::invalid_argument,
+                            "name lookup table at offset 0x%" PRIx64
+                            " does not have a complete header: %s",
----------------
jhenderson wrote:
> ikudrin wrote:
> > dblaikie wrote:
> > > Is it worth saying "name lookup table" here (& in related errors) - seems a bit redundant when the caller will add the specific section name?
> > I would be really grateful for the better wording.
> It seems to me that the caller doesn't add the section name to the message itself, at least in some cases? See the test case. Personally, I think this is fine, although I'd be tempted to be specific and say something about the "pub..." section, probably using a better word, to distinguish it from a .debug_names section.
While the term might be a bit confusing, note that the wording "Name lookup tables" is used in DWARFv4 to refer to these sections. The collocation is not used in DWARFv5, where the sections are deprecated. Anyway, I open to suggestions.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp:74-75
+
+    // Note: do not drop the last set even if there are parsing errors.
+    // This way we can, at least, dump the header and possibly some name tuples.
+    if (!C) {
----------------
jhenderson wrote:
> ikudrin wrote:
> > dblaikie wrote:
> > > Hmm, why only do this on the last one?
> > > If the goal is to be able to parse/dump things that might be a bit broken (which seems generally good) I think we should parse from the length to however long the length says (unless it extends beyond the section) - terminate early if the list terminates early (& warn about the fact that it terminated before its length) & then parse the next thing at current start + length. Don't think that needs a special case for the last one.
> > Well, it looks the code does exactly that you say. Maybe I was not clear enough in the comment. I meant the set that was just read. The error handlers before the `while` loop drop the last added set with `Sets.pop_back()` because even header for it can not be parsed. The error handlers after the loop preserve it and the comment was aimed to explain that difference.
> In the .debug_line code, we dump as much of the prologue as possible, despite the values not necessarily all having been read. For example, if the standard opcode lengths array was truncated, we'd still dump the values for the header fields. I think it would make sense to drop the `pop_back` calls entirely, with the possible exception of the one to do with the initial length field, although even then I'm not 100% sure.
OK. I'll preserve the set for the case when the complete header is not read. You are right, some fields can be dumped even in that case, at least, the length.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83050/new/

https://reviews.llvm.org/D83050