[PATCH] D83050: [DebugInfo] Add more checks to parsing .debug_pub* sections.

James Henderson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jul 3 01:03:18 PDT 2020


jhenderson added inline comments.


================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp:27-28
   Sets.clear();
-  DataExtractor::Cursor C(0);
-  while (C && Data.isValidOffset(C.tell())) {
+  uint64_t Offset = 0;
+  while (Data.isValidOffset(Offset)) {
+    uint64_t SetOffset = Offset;
----------------
What's behind the reasoning for no longer using the `Cursor` throughout?


================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp:31
     Sets.push_back({});
-    Set &SetData = Sets.back();
+    auto &Set = Sets.back();
 
----------------
FWIW, I'd prefer this to remain not `auto`, as it isn't clear to me what the type of `Set` is from the immediate context.


================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp:46
+    Offset += Set.Length;
+    DWARFDataExtractor SetData(Data, Offset);
+    const unsigned OffsetSize = dwarf::getDwarfOffsetByteSize(Set.Format);
----------------
You probably want to include the expected length of the table in this data extractor too, to stop reading into the next table under any circumstance (e.g. the length would partially truncate the final terminator).


================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp:57
+          createStringError(errc::invalid_argument,
+                            "name lookup table at offset 0x%" PRIx64
+                            " does not have a complete header: %s",
----------------
ikudrin wrote:
> dblaikie wrote:
> > Is it worth saying "name lookup table" here (& in related errors) - seems a bit redundant when the caller will add the specific section name?
> I would be really grateful for the better wording.
It seems to me that the caller doesn't add the section name to the message itself, at least in some cases? See the test case. Personally, I think this is fine, although I'd be tempted to be specific and say something about the "pub..." section, probably using a better word, to distinguish it from a .debug_names section.


================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp:74-75
+
+    // Note: do not drop the last set even if there are parsing errors.
+    // This way we can, at least, dump the header and possibly some name tuples.
+    if (!C) {
----------------
ikudrin wrote:
> dblaikie wrote:
> > Hmm, why only do this on the last one?
> > If the goal is to be able to parse/dump things that might be a bit broken (which seems generally good) I think we should parse from the length to however long the length says (unless it extends beyond the section) - terminate early if the list terminates early (& warn about the fact that it terminated before its length) & then parse the next thing at current start + length. Don't think that needs a special case for the last one.
> Well, it looks the code does exactly that you say. Maybe I was not clear enough in the comment. I meant the set that was just read. The error handlers before the `while` loop drop the last added set with `Sets.pop_back()` because even header for it can not be parsed. The error handlers after the loop preserve it and the comment was aimed to explain that difference.
In the .debug_line code, we dump as much of the prologue as possible, despite the values not necessarily all having been read. For example, if the standard opcode lengths array was truncated, we'd still dump the values for the header fields. I think it would make sense to drop the `pop_back` calls entirely, with the possible exception of the one to do with the initial length field, although even then I'm not 100% sure.


================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp:76-89
+    if (!C) {
+      RecoverableErrorHandler(createStringError(
+          errc::invalid_argument,
+          "name lookup table at offset 0x%" PRIx64
+          " terminated prematurely: %s",
+          SetOffset, toString(std::move(C.takeError())).c_str()));
+      continue;
----------------
ikudrin wrote:
> dblaikie wrote:
> > I think phrasing of these two might use some improvement. "terminated prematurely" actually would make me think of the second case - where the list had a terminator before the prefix-encoded length was reached, rather than that the prefix-encoded length was reached before the list ended.
> > 
> > Perhaps "terminated before the expected length was reached" and "reached the expected length without encountering a terminator"? They're both a bit of a mouthful though... open to ideas.
> These wordings are already better than mine. Thanks!
How about the first one be just generic, allowing the cursor's error to provide the context (something like "name lookup table at offset 0x12345678 parsing failed: ..."). I'm actually okay with @ikudrin's current wording for the second one, since @dblaikie's suggestion is as much of a mouthful when you add in the other context.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83050/new/

https://reviews.llvm.org/D83050





More information about the llvm-commits mailing list