[PATCH] D77557: [DWARFDebugLine] Use truncating data extractors for prologue parsing

Mon Jun 1 08:00:50 PDT 2020

labath added inline comments.

================
Comment at: llvm/include/llvm/DebugInfo/DWARF/DWARFDebugLine.h:132
     void dump(raw_ostream &OS, DIDumpOptions DumpOptions) const;
-    Error parse(const DWARFDataExtractor &DebugLineData, uint64_t *OffsetPtr,
+    Error parse(DWARFDataExtractor Data, uint64_t *OffsetPtr,
                 function_ref<void(Error)> RecoverableErrorHandler,
----------------
probinson wrote:
> The truncating constructor takes a `const DWARFDataExtractor&` so I don't understand why this parameter had to change.
It wasn't strictly necessary, but with at `const &` argument, I'd have to create new object to hold the truncated extractor -- this way I can reassign it. It seemed like a good idea to avoid having two data extractors floating around...

================
Comment at: llvm/test/tools/llvm-dwarfdump/X86/Inputs/debug_line_malformed.s:195
 .Linvalid_description_header_end0:
-# The bytes from here onwards will also be read as part of the main body.
-                        # --- Prologue interpretation --- | --- Main body interpretation ---
-.byte   0, 1            # More standard opcodes           | First part of DW_LNE_end_sequence
-# Directory table format
-.byte   1               # One element per directory entry | End of DW_LNE_end_sequence
-.byte   1               # DW_LNCT_path                    | DW_LNS_copy
-.byte   0x08            # DW_FORM_string                  | DW_LNS_const_add_pc
-# Directory table entries
-.byte   1               # 1 directory                     | DW_LNS_copy
-.asciz  "/tmp"          # Directory name                  | four special opcodes + start of DW_LNE_end_sequence
-# File table format
-.byte   1               # 1 element per file entry        | DW_LNE_end_sequence length
-.byte   1               # DW_LNCT_path                    | DW_LNE_end_sequence opcode
-.byte   0x08            # DW_FORM_string                  | DW_LNS_const_add_pc
-# File table entries
-.byte   1               # 1 file                          | DW_LNS_copy
-.asciz  "xyz"           # File name                       | three special opcodes + start of DW_LNE_set_address
-# Header end
+.byte   0, 1, 1         # DW_LNE_end_sequence
+.byte   1               # DW_LNS_copy
----------------
jhenderson wrote:
> I wonder if it's worth a comment at the start here saying something to explain why the formatting of the following bytes is weird. Perhaps something like "The following bytes would be read as part of the prologue header, were data extraction not stopped once the data end is reached, hence they are formatted like a header".
> 
> Something similar applies for the other changes in this file.
Actually, I was thinking of just deleting this stuff as it is no longer interesting to test. I wanted to do it as a separate commit, as that will change the offsets of everything, making it hard to see what changed. WDYT?

================
Comment at: llvm/test/tools/llvm-dwarfdump/X86/debug_line_invalid.test:143

-## V5 invalid MD5 hash form when data beyond the prologue length has
-## been read before the MD5 problem is identified.
+## V5 prologue ends while reading an MD5 hash
 # NONFATAL:      debug_line[0x000002ae]
----------------
jhenderson wrote:
> Missing trailing full stop.
> 
> This may not have been clear from the test comments before, but this and the previous case were supposed to work in tandem, because the invalid MD5 form message was reported in both cases, but not the "not at end" message IIRC.
> 
> I feel like the input for this case probably should be updated to not have a malformed MD5 form. Is there any interaction any more to do with continuing to parse that these 2 cases now don't need to cover?
I've changed this test to use a valid form. I don't think more test are needed -- this is pretty generic code, and form value extraction is tested in DWARFFormValueTest.cpp.

================
Comment at: llvm/test/tools/llvm-dwarfdump/X86/debug_line_invalid.test:206
+# ALL-NEXT: warning: parsing line table prologue at 0x000001ee found an invalid directory or file table description at 0x00000219
+# ALL-NEXT: warning: failed to parse entry content descriptors: malformed uleb128, extends past end
+# ALL-NEXT: warning: parsing line table prologue at 0x0000022f found an invalid directory or file table description at 0x00000251
----------------
jhenderson wrote:
> How straightforward would it be to provide the offset as extra context here, like we do with "unexpected end of data"?
That was done with D80799.

================
Comment at: llvm/unittests/DebugInfo/DWARF/DWARFDebugLineTest.cpp:470
     Errs.emplace_back(
-        (Twine("parsing line table prologue at 0x00000000 should have ended at "
-               "0x000000") +
-         Twine::utohexstr(ExpectedEnd) + " but it ended at 0x000000" +
-         Twine::utohexstr(ExpectedEnd + 2))
-            .str());
+        "failed to parse file entry because extracting the form value failed.");
   }
----------------
jhenderson wrote:
> I'm not sure where this error is coming from (I don't recall seeing it before), but it's got a spurious trailing full stop, which is inconsistent with the other error messages. It might want fixing separately I guess.
there are a couple of messages like this in `parseV5DirFileTables` so I'll fix those separately. The reason you haven't seen it before was that before D77308 `DWARFFormValue::extractValue` was always "successful".

================
Comment at: llvm/unittests/DebugInfo/DWARF/DWARFDebugLineTest.cpp:478-482
-        (Twine("parsing line table prologue at 0x00000000 should have ended at "
-               "0x000000") +
-         Twine::utohexstr(ExpectedEnd) + " but it ended at 0x000000" +
-         Twine::utohexstr(ExpectedEnd + 2))
-            .str());
----------------
jhenderson wrote:
> What happened to this error message?
It's just gone because the error has changed. Now it is not possible for parsing to run off the end of the prologue, so the failure mode is different.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77557/new/

https://reviews.llvm.org/D77557