[PATCH] D77557: [DWARFDebugLine] Use truncating data extractors for prologue parsing

Wed Jun 3 04:54:02 PDT 2020

labath added inline comments.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp:384
   const uint64_t EndPrologueOffset = PrologueLength + *OffsetPtr;
+  DebugLineData = DWARFDataExtractor(DebugLineData, EndPrologueOffset);
   MinInstLength = DebugLineData.getU8(OffsetPtr);
----------------
jhenderson wrote:
> We probably also want to truncate it based on the unit length first, before reading the version etc. I've actually got plans for a local change related to that, but as you're working in this area, I'm happy for you to do it as part of this or a subsequent change, if you want, rather than me doing it.
I can do that now.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp:432-434
+                          "unknown data in line table prologue at 0x%8.8" PRIx64
+                          ": parsing ended (at 0x%8.8" PRIx64
+                          ") before reaching the prologue at 0x%8.8" PRIx64,
----------------
jhenderson wrote:
> A number of our other error messages explicitly use "at offset 0x..." for the offset part of the message. Perhaps we should update this message whilst you're here?
> 
> Alternatively, perhaps we should remove "offset" in all other locations where it is already present (as a separate change). I think either can be argued for - the former for complete clarity, the latter for conciseness.
I've added the offsets. I don't have any opinions on whether should include the "offset" part or not.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp:433
-        PrologueOffset, EndPrologueOffset, *OffsetPtr));
-    *OffsetPtr = EndPrologueOffset;
   }
----------------
jhenderson wrote:
> We probably still want this line - if you don't update it, and the prologue finishes early, you'll get the recoverable error, but then parsing will continue from the point it got to rather than the start of the table body, as claimed by the length. In general, I think we should prefer following the line table's stated lengths where possible.
The reason it's not needed is because there's a `  *OffsetPtr = DebugLineOffset + Prologue.getLength();` down on line 738.

================
Comment at: llvm/test/tools/llvm-dwarfdump/X86/Inputs/debug_line_malformed.s:195
 .Linvalid_description_header_end0:
-# The bytes from here onwards will also be read as part of the main body.
-                        # --- Prologue interpretation --- | --- Main body interpretation ---
-.byte   0, 1            # More standard opcodes           | First part of DW_LNE_end_sequence
-# Directory table format
-.byte   1               # One element per directory entry | End of DW_LNE_end_sequence
-.byte   1               # DW_LNCT_path                    | DW_LNS_copy
-.byte   0x08            # DW_FORM_string                  | DW_LNS_const_add_pc
-# Directory table entries
-.byte   1               # 1 directory                     | DW_LNS_copy
-.asciz  "/tmp"          # Directory name                  | four special opcodes + start of DW_LNE_end_sequence
-# File table format
-.byte   1               # 1 element per file entry        | DW_LNE_end_sequence length
-.byte   1               # DW_LNCT_path                    | DW_LNE_end_sequence opcode
-.byte   0x08            # DW_FORM_string                  | DW_LNS_const_add_pc
-# File table entries
-.byte   1               # 1 file                          | DW_LNS_copy
-.asciz  "xyz"           # File name                       | three special opcodes + start of DW_LNE_set_address
-# Header end
+.byte   0, 1, 1         # DW_LNE_end_sequence
+.byte   1               # DW_LNS_copy
----------------
jhenderson wrote:
> labath wrote:
> > jhenderson wrote:
> > > I wonder if it's worth a comment at the start here saying something to explain why the formatting of the following bytes is weird. Perhaps something like "The following bytes would be read as part of the prologue header, were data extraction not stopped once the data end is reached, hence they are formatted like a header".
> > > 
> > > Something similar applies for the other changes in this file.
> > Actually, I was thinking of just deleting this stuff as it is no longer interesting to test. I wanted to do it as a separate commit, as that will change the offsets of everything, making it hard to see what changed. WDYT?
> I'm happy to remove any unnecessary additional test cases, where they provide no new value, but we need to be slightly careful we don't lose coverage of parsing the file/directory name tables. If a test case doesn't cover a unique code path, then I'm good for it to be deleted, and as a separate patch is fine.
I'm not talking about deleting entire test cases -- just the bits of line tables that are prologue extensions. They were needed/interesting to test when the prologue parsing would not respect prologue boundaries, but now I don't think they serve any purpose and I think we can just put a simple DW_LNE_end_sequence as the line table contents.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77557/new/

https://reviews.llvm.org/D77557