[PATCH] D104271: llvm-dwarfdump: Print warnings on invalid DWARF

Sun Jun 20 14:09:27 PDT 2021

jankratochvil marked 9 inline comments as done.
jankratochvil added inline comments.

================
Comment at: llvm/include/llvm/DebugInfo/DWARF/DWARFContext.h:372-375
+  static const char *getAddressSizeSupported() {
+    static const char s[] = "2, 4, 8";
+    return s;
+  }
----------------
jhenderson wrote:
> I'd rename this function to `getSupportedAddressSizes()`, which reads slightly better.
> 
> Also, it can be simplified, as shown in line. String literals don't have lifetime issues, so there's no need for the static local variable.
> String literals don't have lifetime issues, so there's no need for the static local variable.

True, I am stupid.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp:103
+}
+
 DWARFDebugAbbrev::DWARFDebugAbbrev() { clear(); }
----------------
dblaikie wrote:
> jhenderson wrote:
> > jankratochvil wrote:
> > > This function look to me as too much code for too little benefit but when @dblaikie has requested it then why not.
> > > 
> > I'm not entirely convinced we need to print all valid abbrev values, even in ranged form, since typically, the abbrevs will go from 1-max in a contiguous set. That being said, I'm not opposed to it.
> > 
> > I think the code could be simplified anyway. Something like the following is more readable to me. It also handles adjacent codes being non-contiguous within the set of abbrevs:
> > ```
> > std::string DWARFAbbreviationDeclarationSet::getCodeRange() const {
> >   // Create a sorted list of all abbrev codes.
> >   std::vector<uint32_t> Codes;
> >   Codes.reserve(Decls.size());
> >   std::transform(Decls.begin(), Decls.end(), std::back_inserter(Codes),
> >     [](const DWARFAbbreviationDeclaration &Decl){
> >       return Decl.getCode();
> >   });
> >   std::sort(Codes.begin(), Codes.end());
> > 
> >   std::string Buffer = "";
> >   raw_string_ostream Stream(Buffer);
> >   // Each iteration through this look represents a single contiguous range in the set of codes.
> >   for(auto Current = Codes.begin(), End = Codes.end(); Current != End;) {
> >     uint32_t RangeStart = *Current;
> >     // Add the current range start.
> >     Stream << *Current;
> >     uint32_t RangeEnd = RangeStart;
> >     // Find the end of the current range.
> >     while(++Current != End && *Current == RangeEnd + 1)
> >       ++RangeEnd;
> >     // If there is more than one value in the range, add the range end too.
> >     if (RangeStart != RangeEnd)
> >       Stream << "-" << RangeEnd;
> >     // If there is at least one more range, add a separator.
> >     if (Current != End)
> >       Stream << ", ";
> >   }
> >   return Buffer;
> > }
> > ```
> Yep, all this complexity would fall under the clause I mentioned in the original feedback: " (I forget if the abbrev table is necessarily contiguous - if it isn't, then maybe that's too complicated)" - so the table isn't contiguous, and maybe this is too complicated to be worth it? I really don't mind either way, at this point.
> 
> @jhenderson's version seems nice, if we're going to do this.
TBH I did not sort the Abbrevs intentionally. This debugging output is for DWARF developers, not for end users. By sorting it the message gets too disconnected from what is really written in the DWARF. Moreover when usually the Abbrevs are sorted. But I have accepted your version.
The look ahead instead of a lambda flusher is an interesting idea.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp:37
+        createStringError(errc::invalid_argument,
+                          "DWARF unit [0x%8.8" PRIx64 ", 0x%8.8" PRIx64 ") "
+                          "tries to read DIEs at offset 0x%8.8" PRIx64,
----------------
jhenderson wrote:
> It's not going to be clear to the end user that these two values represent offsets. I'd be more explicit: "DWARF unit from offset x to offset y ..."
> 
> Same applies below.
I haven't changed this yet. I disagree with "from offset x to offset y" as that would need to be rather "from offset x incl. to offset y excl." which already looks to me too talkative. Primarily as this message is for DWARF developers, not for end users.
And then we should change an already existing error message: [[ https://github.com/llvm/llvm-project/blob/ffa252e8ce2465893f2c798b0586695f03a68d18/llvm/lib/Support/DataExtractor.cpp#L26 | "while reading [0x%x, 0x%x)" ]]

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp:55
+                          "contains invalid abbreviation set "
+                          "offset 0x%" PRIx64 " at offset 0x%8.8" PRIx64,
+                          U.getOffset(), U.getNextUnitOffset(),
----------------
jhenderson wrote:
> It's not clear to me (when reading the message without the code context) what these last two offsets represent. I suspect that the last offset is actually unnecessary, since the OffsetPtr location for this case is going to be fixed within the unit header, if I'm not mistaken.
> I suspect that the last offset is actually unnecessary, since the OffsetPtr location for this case is going to be fixed within the unit header, if I'm not mistaken.

I agree, thanks.

================
Comment at: llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s:3
+# RUN:   | llvm-dwarfdump - 2>&1 | FileCheck --check-prefix=CHECK-CUEND %s
+# CHECK-CUEND: warning: DWARF compile unit extends beyond its bounds cu 0x00000000 at 0x0000001f
+
----------------
jhenderson wrote:
> jankratochvil wrote:
> > jhenderson wrote:
> > > It would probably be a good idea if you used a non-zero value for the cu index. That will help flag up any conversion issues (e.g. because you used a 32-bit print format for a 64-bit number).
> > > 
> > > Same goes for the remaining messages.
> > > 
> > > Here and below, you can drop the "CHECK-" bit of the check prefixes, to make them more concise.
> > > It would probably be a good idea if you used a non-zero value for the cu index.
> > 
> > Done.
> > 
> > > That will help flag up any conversion issues (e.g. because you used a 32-bit print format for a 64-bit number).
> > 
> > That would not show anything more as on 64-bit platforms variadic function extends all parameters to (at least) 64 bits. Therefore even 32-bit format will still read 64-bit variadic parameter.
> > 
> Right, but not all supported platforms are 64-bit. I've actually seen bugs precisely because of this sort of issue in similar code.
It is true 32-bit buildbots would catch it.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104271/new/

https://reviews.llvm.org/D104271