[PATCH] D72158: [DebugInfo] Make most debug line prologue errors non-fatal to parsing

Fri Jan 17 17:14:44 PST 2020

dblaikie accepted this revision.
dblaikie added inline comments.
This revision is now accepted and ready to land.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFContext.cpp:463-477
     while (!Parser.done()) {
       if (DumpOffset && Parser.getOffset() != *DumpOffset) {
-        Parser.skip(dumpWarning);
+        Parser.skip(dumpWarning, dumpWarning);
         continue;
       }
       OS << "debug_line[" << format("0x%8.8" PRIx64, Parser.getOffset())
          << "]\n";
----------------
Note for future work: This seems fairly convoluted in terms of how to iterate over the section, parsing the contributions & handling the errors.

Having the explicit "skip" operation might be nice to avoid - and there's always recurring discussions about how to make this sort of parsing lazy so that users can get as much or as little data as they need without parsing things they don't need (so if the API were made more lazy generally - then "skip" would just be "parseLazily" then request the contribution length - if that doesn't error out, jump that many bytes ahead and parse the next one) - this would also move to fewer callback based errors, and more errors propagated on specific API interactions (eg: if you ask for the length, you return an ErrorOr<Length>, etc). Not sure if it's better or worse, but it's an option to consider.

Where does this code capture hard failures? Ah, in "Parser.done()"? So any hard failure (eg: failure to parse the length at all) results in the Parser moving to state "done" (& producing an error through one of the callbacks?)? 

& should one or both of the "dumpWarning"s be errors? Recoverable failures (ultimately in the worst case "we could parse the length, but nothing else makes any sense" is a recoverable error) aren't necessarily warnings. Take Clang's behavior - lots of hard errors are recoverable (you missed a semicolon, we're going to keep going assuming you meant to have a semicolon there - but we aren't going to compile this code, it's still an error)

================
Comment at: llvm/test/tools/llvm-dwarfdump/X86/Inputs/debug_line_malformed.s:334-335
 # Data to show that the rest of the prologue is skipped.
-.byte   6
+.byte   1               # Would be interpreted as DW_LNS_copy if parsed
+                        # as part of program.
 .Linvalid_md5_header_end0:
----------------
For this and other comments - probably not important/might be distracting to say what these would be parsed as rather than just to say that they're garbage data within the encoded length of the extended opcode, but beyond the parsed representation of that opcode? (or somtehing to that effect)

================
Comment at: llvm/test/tools/llvm-dwarfdump/X86/Inputs/debug_line_malformed.s:374-376
+.byte   1               # 1 file/DW_LNS_copy
+.asciz  "xyz"           # File name/3 special opcodes + DW_LNE_set_address start
+.byte   9               # MD5 hash value/DW_LNE_set_address length
----------------
Not sure I understand what's happening here - again, perhaps you're describing two different ways this could be parsed? I'd focus on the way it is being parsed & not the ambiguity of other ways it could be parsed/error-recovered. (you can mention what assumptions about previous invalid data are made to form that particular parse)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72158/new/

https://reviews.llvm.org/D72158