[PATCH] D72158: [DebugInfo] Make most debug line prologue errors non-fatal to parsing

Mon Jan 27 02:53:12 PST 2020

jhenderson marked 3 inline comments as done.
jhenderson added inline comments.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFContext.cpp:463-477
     while (!Parser.done()) {
       if (DumpOffset && Parser.getOffset() != *DumpOffset) {
-        Parser.skip(dumpWarning);
+        Parser.skip(dumpWarning, dumpWarning);
         continue;
       }
       OS << "debug_line[" << format("0x%8.8" PRIx64, Parser.getOffset())
          << "]\n";
----------------
dblaikie wrote:
> Note for future work: This seems fairly convoluted in terms of how to iterate over the section, parsing the contributions & handling the errors.
> 
> Having the explicit "skip" operation might be nice to avoid - and there's always recurring discussions about how to make this sort of parsing lazy so that users can get as much or as little data as they need without parsing things they don't need (so if the API were made more lazy generally - then "skip" would just be "parseLazily" then request the contribution length - if that doesn't error out, jump that many bytes ahead and parse the next one) - this would also move to fewer callback based errors, and more errors propagated on specific API interactions (eg: if you ask for the length, you return an ErrorOr<Length>, etc). Not sure if it's better or worse, but it's an option to consider.
> 
> Where does this code capture hard failures? Ah, in "Parser.done()"? So any hard failure (eg: failure to parse the length at all) results in the Parser moving to state "done" (& producing an error through one of the callbacks?)? 
> 
> & should one or both of the "dumpWarning"s be errors? Recoverable failures (ultimately in the worst case "we could parse the length, but nothing else makes any sense" is a recoverable error) aren't necessarily warnings. Take Clang's behavior - lots of hard errors are recoverable (you missed a semicolon, we're going to keep going assuming you meant to have a semicolon there - but we aren't going to compile this code, it's still an error)
Thanks for the comments!

> Where does this code capture hard failures? Ah, in "Parser.done()"? So any hard failure (eg: failure to parse the length at all) results in the Parser moving to state "done" (& producing an error through one of the callbacks?)?

Right. The hard failures are primarily there to handle the cases where we really don't know where to continue from because you gave it data that doesn't really look like a line table. Since the parser would then be in a bad state, trying to get the next unit doesn't really make sense, so we just say we're done.

> & should one or both of the "dumpWarning"s be errors? Recoverable failures (ultimately in the worst case "we could parse the length, but nothing else makes any sense" is a recoverable error) aren't necessarily warnings.

It's difficult to say, because it depends entirely on the client. Ideally, the tool using this library would configure the situation for its own needs. Always warning may well not be appropriate for some clients, but should it really be an error if e.g. lldb can't parse some of the debug data? I think the main client here is actually llvm-dwarfdump, where it could be argued that lots of things are really errors, but we shouldn't stop. But should the dumper exit with code 1 if the format is malformed? I don't know!

================
Comment at: llvm/test/tools/llvm-dwarfdump/X86/Inputs/debug_line_malformed.s:334-335
 # Data to show that the rest of the prologue is skipped.
-.byte   6
+.byte   1               # Would be interpreted as DW_LNS_copy if parsed
+                        # as part of program.
 .Linvalid_md5_header_end0:
----------------
dblaikie wrote:
> For this and other comments - probably not important/might be distracting to say what these would be parsed as rather than just to say that they're garbage data within the encoded length of the extended opcode, but beyond the parsed representation of that opcode? (or somtehing to that effect)
Fair enough. In some cases, the extra context is there to explain why we check what we do, but I'm okay changing it.

================
Comment at: llvm/test/tools/llvm-dwarfdump/X86/Inputs/debug_line_malformed.s:374-376
+.byte   1               # 1 file/DW_LNS_copy
+.asciz  "xyz"           # File name/3 special opcodes + DW_LNE_set_address start
+.byte   9               # MD5 hash value/DW_LNE_set_address length
----------------
dblaikie wrote:
> Not sure I understand what's happening here - again, perhaps you're describing two different ways this could be parsed? I'd focus on the way it is being parsed & not the ambiguity of other ways it could be parsed/error-recovered. (you can mention what assumptions about previous invalid data are made to form that particular parse)
Actually, in this context, the header is being read twice, once actually as the header, and once again as part of the body, so indeed I am describing two different ways this is parsed. The change in behaviour is to continue reading from the end of the header, as claimed in the header length field, which means these parts are both part of the file table, and also the start of a line table sequence.

The comments are intended to help people (including myself) trying to match up what the data in the body is doing versus the checks in the file. I could format them differently as a table perhaps, to make it clearer what they represent on each parse?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72158/new/

https://reviews.llvm.org/D72158