[PATCH] D83050: [DebugInfo] Add more checks to parsing .debug_pub* sections.

Mon Jul 6 00:38:10 PDT 2020

jhenderson marked an inline comment as done.
jhenderson added inline comments.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp:37
+    if (Err) {
+      // Drop the lastly added set because it does not contain anything useful
+      // to dump.
----------------
Perhaps "newly" instead of "lastly".

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp:57
+    if (!C) {
+      // Preserve the lastly added set because at least some field of the header
+      // are read and can be dumped.
----------------
Same "lastly" -> "newly" maybe. I feel like it reads a little better.

Also "field" -> "fields"

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp:27-28
   Sets.clear();
-  DataExtractor::Cursor C(0);
-  while (C && Data.isValidOffset(C.tell())) {
+  uint64_t Offset = 0;
+  while (Data.isValidOffset(Offset)) {
+    uint64_t SetOffset = Offset;
----------------
ikudrin wrote:
> jhenderson wrote:
> > What's behind the reasoning for no longer using the `Cursor` throughout?
> The method now reports all encountered errors through `RecoverableErrorHandler` and does not return `Error`. The `Cursor` requires its error state to be checked in any case. While the former code could simply return the error state, now this checking is a bit inconvenient, and, moreover, useless.
I'm not sure I follow.

As far as my understanding of `Cursor` goes, you can have:

```
  DataExtractor::Cursor C(0);
  while (C && Data.isValidOffset(C.tell())) {
    // Parse the length
    if (!C) { /* report invalid length, using C.takeError() */ return; }
    // Parse the header
    while (C) { /* parse entries */ }
    if (C && C.tell() != Offset) { /* report bad terminator */ }
  }
  if (!C) { /* report parsing error using C.takeError() */
```

The `Cursor` is checked by either the final error check outside the loop in most cases, or by the invalid length report, so we're good (note that `C.takeError()` does not need calling if the `Cursor` is in a success state, much like `Expected`). The only case where it might be different is if `Cursor` is in an error state due to some error other than a running-off-the-end error, in which case it would abort early. If you want to continue instead, you could do almost the same as you've got:

```
  while (Offset) {
    DataExtractor::Cursor C(Offset);
    ... = Data.getInitialLength(C);
    if (!C) { /* report invalid length, using C.takeError() */ return; }
    // Parse the header
    while (C) { /* parse entries */ }
    if (C && C.tell() != Offset) { /* report bad terminator */ }
    if (!C) { /* report parsing error using C.takeError() */
  }
```

I'm not sure I see how the latter is any more complex or inconvenient than instantiating a different Error variable and passing pointers around?

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugPubTable.cpp:46
+    Offset += Set.Length;
+    DWARFDataExtractor SetData(Data, Offset);
+    const unsigned OffsetSize = dwarf::getDwarfOffsetByteSize(Set.Format);
----------------
ikudrin wrote:
> jhenderson wrote:
> > You probably want to include the expected length of the table in this data extractor too, to stop reading into the next table under any circumstance (e.g. the length would partially truncate the final terminator).
> The second parameter, `Offset`, is the limiter. Note that it is just updated to point to the start of the next table which is the same as the end of the current one.
Thanks I misread.

================
Comment at: llvm/test/tools/llvm-dwarfdump/X86/debug_pub_tables_invalid.s:1
 # RUN: llvm-mc -triple x86_64 %s -filetype=obj -o %t
 # RUN: not llvm-dwarfdump -debug-pubnames %t 2>&1 | FileCheck %s
----------------
I'd probably fold in this test case now into the other file. I don't think there's any benefit having them separate. Alternatively, this lives separately, and move the other test case into the library testing. The idea is that we test the code in detail with the library tests, and at a high level in the tool tests (i.e. showing we handle the reported output). I don't mind either approach.

================
Comment at: llvm/test/tools/llvm-dwarfdump/X86/debug_pubnames_error_cases.s:8
+
+## This set does not contain all required fields in the header.
+# CHECK-NEXT: length = 0x00000002, format = DWARF32, version = 0x0002, unit_offset = 0x00000000, unit_size = 0x00000000
----------------
Strictly speaking, we should have an error check for each individual field, not just the header in general. This is because we could be using the non-checking version of the `get*` functions. This check currently only checks parsing of the offset field, but there's also the version and size fields.

Similar comment applies for the individual entries.

================
Comment at: llvm/test/tools/llvm-dwarfdump/X86/debug_pubnames_error_cases.s:30
+
+## This set contains a string which is not preperly terminated.
+# CHECK-NEXT: length = 0x00000011, format = DWARF32, version = 0x0002, unit_offset = 0x00000064, unit_size = 0x000000f0
----------------
preparly -> properly

================
Comment at: llvm/test/tools/llvm-dwarfdump/X86/debug_pubnames_error_cases.s:61
+
+## The remaining space in the section is too short to even contein a unit length
+## field.
----------------
contein -> contain

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D83050/new/

https://reviews.llvm.org/D83050