[PATCH] D104271: llvm-dwarfdump: Print warnings on invalid DWARF

Mon Jun 21 00:46:01 PDT 2021

jhenderson added inline comments.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp:81
+  raw_string_ostream Stream(Buffer);
+  // Each iteration through this look represents a single contiguous range in
+  // the set of codes.
----------------

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugAbbrev.cpp:103
+}
+
 DWARFDebugAbbrev::DWARFDebugAbbrev() { clear(); }
----------------
jankratochvil wrote:
> dblaikie wrote:
> > jhenderson wrote:
> > > jankratochvil wrote:
> > > > This function look to me as too much code for too little benefit but when @dblaikie has requested it then why not.
> > > > 
> > > I'm not entirely convinced we need to print all valid abbrev values, even in ranged form, since typically, the abbrevs will go from 1-max in a contiguous set. That being said, I'm not opposed to it.
> > > 
> > > I think the code could be simplified anyway. Something like the following is more readable to me. It also handles adjacent codes being non-contiguous within the set of abbrevs:
> > > ```
> > > std::string DWARFAbbreviationDeclarationSet::getCodeRange() const {
> > >   // Create a sorted list of all abbrev codes.
> > >   std::vector<uint32_t> Codes;
> > >   Codes.reserve(Decls.size());
> > >   std::transform(Decls.begin(), Decls.end(), std::back_inserter(Codes),
> > >     [](const DWARFAbbreviationDeclaration &Decl){
> > >       return Decl.getCode();
> > >   });
> > >   std::sort(Codes.begin(), Codes.end());
> > > 
> > >   std::string Buffer = "";
> > >   raw_string_ostream Stream(Buffer);
> > >   // Each iteration through this look represents a single contiguous range in the set of codes.
> > >   for(auto Current = Codes.begin(), End = Codes.end(); Current != End;) {
> > >     uint32_t RangeStart = *Current;
> > >     // Add the current range start.
> > >     Stream << *Current;
> > >     uint32_t RangeEnd = RangeStart;
> > >     // Find the end of the current range.
> > >     while(++Current != End && *Current == RangeEnd + 1)
> > >       ++RangeEnd;
> > >     // If there is more than one value in the range, add the range end too.
> > >     if (RangeStart != RangeEnd)
> > >       Stream << "-" << RangeEnd;
> > >     // If there is at least one more range, add a separator.
> > >     if (Current != End)
> > >       Stream << ", ";
> > >   }
> > >   return Buffer;
> > > }
> > > ```
> > Yep, all this complexity would fall under the clause I mentioned in the original feedback: " (I forget if the abbrev table is necessarily contiguous - if it isn't, then maybe that's too complicated)" - so the table isn't contiguous, and maybe this is too complicated to be worth it? I really don't mind either way, at this point.
> > 
> > @jhenderson's version seems nice, if we're going to do this.
> TBH I did not sort the Abbrevs intentionally. This debugging output is for DWARF developers, not for end users. By sorting it the message gets too disconnected from what is really written in the DWARF. Moreover when usually the Abbrevs are sorted. But I have accepted your version.
> The look ahead instead of a lambda flusher is an interesting idea.
> 
I don't have a strong opinion about whether they should be sorted or not, and if you would prefer dropping the sorting, that's fine (I think the rest of the code will just work, but am not 100% certain without spending more time than I care to thinking about it).

I'm also happy if you'd prefer to drop the entire thing.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp:53
+        errc::invalid_argument,
+        "DWARF unit [0x%8.8" PRIx64 ", 0x%8.8" PRIx64 ") "
+        "contains invalid abbreviation set offset 0x%" PRIx64,
----------------
In this case, you don't need the end offset of the unit, as it has no impact here - only the start offset is actually important, so you can identify the unit that is being read.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp:65
+                          "DWARF unit [0x%8.8" PRIx64 ", 0x%8.8" PRIx64 ") "
+                          "contains invalid abbreviation %u at "
+                          "offset 0x%8.8" PRIx64 ", valid abbreviations are %s",
----------------
`AbbrCode` is a `uint64_t`. The test will fail on some platforms due to either bitness or endianness issues.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp:93
+          "DWARF unit [0x%8.8" PRIx64 ", 0x%8.8" PRIx64 ") "
+          "contains invalid FORM_* 0x%x at offset 0x%8.8" PRIx64,
+          U.getOffset(), U.getNextUnitOffset(), AttrSpec.Form, *OffsetPtr));
----------------
`dwarf::Form` is specified to be a `uint16_t` in its declaration.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugInfoEntry.cpp:37
+        createStringError(errc::invalid_argument,
+                          "DWARF unit [0x%8.8" PRIx64 ", 0x%8.8" PRIx64 ") "
+                          "tries to read DIEs at offset 0x%8.8" PRIx64,
----------------
jankratochvil wrote:
> jhenderson wrote:
> > It's not going to be clear to the end user that these two values represent offsets. I'd be more explicit: "DWARF unit from offset x to offset y ..."
> > 
> > Same applies below.
> I haven't changed this yet. I disagree with "from offset x to offset y" as that would need to be rather "from offset x incl. to offset y excl." which already looks to me too talkative. Primarily as this message is for DWARF developers, not for end users.
> And then we should change an already existing error message: [[ https://github.com/llvm/llvm-project/blob/ffa252e8ce2465893f2c798b0586695f03a68d18/llvm/lib/Support/DataExtractor.cpp#L26 | "while reading [0x%x, 0x%x)" ]]
> 
With at least offsets, I would never read "from offset X to offset Y" as inclusive at both ends, because of the nature of what an offset represents. But maybe it is a real issue. You could achieve the same meaning, without the ambiguity risk by saying "with length X at offset Y" instead, for example. The length is actually the thing that's encoded in the DWARF after all. You could make it slightly less talkative like this: "DWARF unit (offset 0x1234, length 0x4321) tries to ...".

I don't think the two error messages are quite equivalent. In the DataExtractor one, the message talks about reading the range, and therefore it's somewhat clearer that you're dealing with an offset, whereas here the numbers in the range aren't things that are mentioned as being read or similar, so you lose that context.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp:285
+                          "[0x%8.8" PRIx64 ", 0x%8.8" PRIx64 ") "
+                          "has its relative Type offset 0x%8.8" PRIx64 " "
+                          "outside of its relative boundary "
----------------
As this is for DWARF developers, maybe it would be best to use the actual field name as defined by the DWARF standard (specifically "type_offset"), for something like: DWARF type unit (offset 0x1234, length 0x4321) type_offset 0x1111 points inside the header or past the unit end".

(I also included my suggestions from above, and a couple of other wording suggestions - I'm not a massive fan of specifying "relative boundary" because it's not clear to me what the boundary is relative to).

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp:287
+                          "outside of its relative boundary "
+                          "[0x%8.8" PRIx64 ", 0x%8.8" PRIx64 ")",
+                          Offset, NextCUOffset, TypeOffset, Size,
----------------
`Size` is a `uint8_t`.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp:292
+  }
+  if (!debug_info.isValidOffset(getNextUnitOffset() - 1)) {
+    Context.getWarningHandler()(
----------------
I wonder if this error needs putting earlier, in case the header is truncated?

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp:296
+                          "DWARF unit [0x%8.8" PRIx64 ", 0x%8.8" PRIx64 ") "
+                          "extends past section size 0x%8.8" PRIx64,
+                          Offset, NextCUOffset, debug_info.size()));
----------------
`debug_info.size()` returns a `size_t`, so may not always be 64 bits.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp:300
+  }
+  if (!DWARFContext::isSupportedVersion(getVersion())) {
+    Context.getWarningHandler()(
----------------
I'd put this before the type_offset check, as a different DWARF version might not have the type offset field at all etc.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp:304
+                          "DWARF unit [0x%8.8" PRIx64 ", 0x%8.8" PRIx64 ") "
+                          "has unsupported version %u, supported are 2-%u",
+                          Offset, NextCUOffset, getVersion(),
----------------
`getVersion()` returns a `uint16_t`.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp:312
+        createStringError(errc::invalid_argument,
+                          "DWARF unit [0x%8.8" PRIx64 ", 0x%8.8" PRIx64 ") "
+                          "has unsupported address size %u, supported are %s",
----------------
As above, I don't think you need the end offset here, as it isn't relevant for this message.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp:313
+                          "DWARF unit [0x%8.8" PRIx64 ", 0x%8.8" PRIx64 ") "
+                          "has unsupported address size %u, supported are %s",
+                          Offset, NextCUOffset, getAddressByteSize(),
----------------
`getAddressByteSize()` returns a `uint8_t`.

================
Comment at: llvm/test/tools/llvm-dwarfdump/X86/debug-entry-invalid.s:25
+# RUN:   | llvm-dwarfdump - 2>&1 | FileCheck --check-prefix=BADTYPEUNIT %s
+# BADTYPEUNIT: warning: DWARF Type unit [0x0000002c, 0x00000045) has its relative Type offset 0x00000007 outside of its relative boundary [0x00000018, 0x00000019)
+
----------------
I'd test both the cases where the offset points to within the header and past the end of the unit.

================
Comment at: llvm/test/tools/llvm-dwarfdump/X86/debug-entry-invalid.s:93
+.ifdef SHORTINITLEN
+        .byte    0x55                   # too short Length of Unit
+.endif
----------------
Nit: all the other comments use upper-case for their first letter.

================
Comment at: llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s:3
+# RUN:   | llvm-dwarfdump - 2>&1 | FileCheck --check-prefix=CHECK-CUEND %s
+# CHECK-CUEND: warning: DWARF compile unit extends beyond its bounds cu 0x00000000 at 0x0000001f
+
----------------
jankratochvil wrote:
> jhenderson wrote:
> > jankratochvil wrote:
> > > jhenderson wrote:
> > > > It would probably be a good idea if you used a non-zero value for the cu index. That will help flag up any conversion issues (e.g. because you used a 32-bit print format for a 64-bit number).
> > > > 
> > > > Same goes for the remaining messages.
> > > > 
> > > > Here and below, you can drop the "CHECK-" bit of the check prefixes, to make them more concise.
> > > > It would probably be a good idea if you used a non-zero value for the cu index.
> > > 
> > > Done.
> > > 
> > > > That will help flag up any conversion issues (e.g. because you used a 32-bit print format for a 64-bit number).
> > > 
> > > That would not show anything more as on 64-bit platforms variadic function extends all parameters to (at least) 64 bits. Therefore even 32-bit format will still read 64-bit variadic parameter.
> > > 
> > Right, but not all supported platforms are 64-bit. I've actually seen bugs precisely because of this sort of issue in similar code.
> It is true 32-bit buildbots would catch it.
> > It would probably be a good idea if you used a non-zero value for the cu index.
> 
> Done.
> 
> > That will help flag up any conversion issues (e.g. because you used a 32-bit print format for a 64-bit number).
> 
> That would not show anything more as on 64-bit platforms variadic function extends all parameters to (at least) 64 bits. Therefore even 32-bit format will still read 64-bit variadic parameter.
> 

================
Comment at: llvm/test/tools/llvm-dwarfdump/X86/format-warnings.s:3
+# RUN:   | llvm-dwarfdump - 2>&1 | FileCheck --check-prefix=CHECK-CUEND %s
+# CHECK-CUEND: warning: DWARF compile unit extends beyond its bounds cu 0x00000000 at 0x0000001f
+
----------------
jhenderson wrote:
> jankratochvil wrote:
> > jhenderson wrote:
> > > jankratochvil wrote:
> > > > jhenderson wrote:
> > > > > It would probably be a good idea if you used a non-zero value for the cu index. That will help flag up any conversion issues (e.g. because you used a 32-bit print format for a 64-bit number).
> > > > > 
> > > > > Same goes for the remaining messages.
> > > > > 
> > > > > Here and below, you can drop the "CHECK-" bit of the check prefixes, to make them more concise.
> > > > > It would probably be a good idea if you used a non-zero value for the cu index.
> > > > 
> > > > Done.
> > > > 
> > > > > That will help flag up any conversion issues (e.g. because you used a 32-bit print format for a 64-bit number).
> > > > 
> > > > That would not show anything more as on 64-bit platforms variadic function extends all parameters to (at least) 64 bits. Therefore even 32-bit format will still read 64-bit variadic parameter.
> > > > 
> > > Right, but not all supported platforms are 64-bit. I've actually seen bugs precisely because of this sort of issue in similar code.
> > It is true 32-bit buildbots would catch it.
> > > It would probably be a good idea if you used a non-zero value for the cu index.
> > 
> > Done.
> > 
> > > That will help flag up any conversion issues (e.g. because you used a 32-bit print format for a 64-bit number).
> > 
> > That would not show anything more as on 64-bit platforms variadic function extends all parameters to (at least) 64 bits. Therefore even 32-bit format will still read 64-bit variadic parameter.
> > 
> 
> 
I've gone through and highlighted where else I see a type mismatch in your print formats versus the type being used. I think it would be a good idea to match the types, as whilst the implementation eventually calls a function that uses variadic arguments, I don't think there's any strict requirement for it to do so. Plus, the integer promotion is subtle, and unnecessary code subtlety harms maintainability. Finally, some of those might actually result in bugs.

>From my understanding, integer promotion of variadic arguments is only as far as `int`/`unsigned int`. The size of `int` is implementation defined and not necessarily the same as `uint32_t` (though admittedly I don't know of any cases where it isn't currently), nor anything necessarily to do with the host system bitness. Thus, when using `uint*_t` types, you should use their corresponding macros for printing.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104271/new/

https://reviews.llvm.org/D104271