[PATCH] D104271: llvm-dwarfdump: Print warnings on invalid DWARF

Fri Jun 18 01:00:37 PDT 2021

jhenderson added a comment.

Not had time to review the test cases yet, but will do that next week.

================
Comment at: llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h:372-375
+  static const char *getAddressSizeSupported() {
+    static const char s[] = "2, 4, 8";
+    return s;
+  }
----------------
I'd rename this function to `getSupportedAddressSizes()`, which reads slightly better.

Also, it can be simplified, as shown in line. String literals don't have lifetime issues, so there's no need for the static local variable.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp:74
+  bool Dash;
+  auto flush = [&]() {
+    if (!LastCode)
----------------
Lambdas follow variable naming style, so `flush` -> `Flush`.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp:103
+}
+
 DWARFDebugAbbrev::DWARFDebugAbbrev() { clear(); }
----------------
jankratochvil wrote:
> This function look to me as too much code for too little benefit but when @dblaikie has requested it then why not.
> 
I'm not entirely convinced we need to print all valid abbrev values, even in ranged form, since typically, the abbrevs will go from 1-max in a contiguous set. That being said, I'm not opposed to it.

I think the code could be simplified anyway. Something like the following is more readable to me. It also handles adjacent codes being non-contiguous within the set of abbrevs:
```
std::string DWARFAbbreviationDeclarationSet::getCodeRange() const {
  // Create a sorted list of all abbrev codes.
  std::vector<uint32_t> Codes;
  Codes.reserve(Decls.size());
  std::transform(Decls.begin(), Decls.end(), std::back_inserter(Codes),
    [](const DWARFAbbreviationDeclaration &Decl){
      return Decl.getCode();
  });
  std::sort(Codes.begin(), Codes.end());

  std::string Buffer = "";
  raw_string_ostream Stream(Buffer);
  // Each iteration through this look represents a single contiguous range in the set of codes.
  for(auto Current = Codes.begin(), End = Codes.end(); Current != End;) {
    uint32_t RangeStart = *Current;
    // Add the current range start.
    Stream << *Current;
    uint32_t RangeEnd = RangeStart;
    // Find the end of the current range.
    while(++Current != End && *Current == RangeEnd + 1)
      ++RangeEnd;
    // If there is more than one value in the range, add the range end too.
    if (RangeStart != RangeEnd)
      Stream << "-" << RangeEnd;
    // If there is at least one more range, add a separator.
    if (Current != End)
      Stream << ", ";
  }
  return Buffer;
}
```

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp:37
+        createStringError(errc::invalid_argument,
+                          "DWARF unit [0x%8.8" PRIx64 ", 0x%8.8" PRIx64 ") "
+                          "tries to read DIEs at offset 0x%8.8" PRIx64,
----------------
It's not going to be clear to the end user that these two values represent offsets. I'd be more explicit: "DWARF unit from offset x to offset y ..."

Same applies below.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp:55
+                          "contains invalid abbreviation set "
+                          "offset 0x%" PRIx64 " at offset 0x%8.8" PRIx64,
+                          U.getOffset(), U.getNextUnitOffset(),
----------------
It's not clear to me (when reading the message without the code context) what these last two offsets represent. I suspect that the last offset is actually unnecessary, since the OffsetPtr location for this case is going to be fixed within the unit header, if I'm not mistaken.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp:400
   DWARFDataExtractor DebugInfoData = getDebugInfoExtractor();
+  // It has been already checked by DWARFUnitHeader::extract.
+  assert(DebugInfoData.isValidOffset(NextCUOffset - 1));
----------------
I think this is a little clearer.

================
Comment at: llvm/test/tools/llvm-dwarfdump/X86/debug-entry-invalid.s:1
+# Test llvm-dwarfdump detects and reports invalid DWARF format of the file.
+
----------------
Nit: I'm trying to encourage new tests to use '##' for comments, to help distinguish them from lit and FileCheck directives.

================
Comment at: llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s:3
+# RUN:   | llvm-dwarfdump - 2>&1 | FileCheck --check-prefix=CHECK-CUEND %s
+# CHECK-CUEND: warning: DWARF compile unit extends beyond its bounds cu 0x00000000 at 0x0000001f
+
----------------
jankratochvil wrote:
> jhenderson wrote:
> > It would probably be a good idea if you used a non-zero value for the cu index. That will help flag up any conversion issues (e.g. because you used a 32-bit print format for a 64-bit number).
> > 
> > Same goes for the remaining messages.
> > 
> > Here and below, you can drop the "CHECK-" bit of the check prefixes, to make them more concise.
> > It would probably be a good idea if you used a non-zero value for the cu index.
> 
> Done.
> 
> > That will help flag up any conversion issues (e.g. because you used a 32-bit print format for a 64-bit number).
> 
> That would not show anything more as on 64-bit platforms variadic function extends all parameters to (at least) 64 bits. Therefore even 32-bit format will still read 64-bit variadic parameter.
> 
Right, but not all supported platforms are 64-bit. I've actually seen bugs precisely because of this sort of issue in similar code.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104271/new/

https://reviews.llvm.org/D104271