[PATCH] D136395: Add the ability to verify the .debug_aranges section.

David Blaikie via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 24 15:54:07 PDT 2022


dblaikie added a comment.

In D136395#3880527 <https://reviews.llvm.org/D136395#3880527>, @ayermolo wrote:

> In D136395#3880459 <https://reviews.llvm.org/D136395#3880459>, @dblaikie wrote:
>
>> In D136395#3880008 <https://reviews.llvm.org/D136395#3880008>, @ayermolo wrote:
>>
>>> In D136395#3875846 <https://reviews.llvm.org/D136395#3875846>, @dblaikie wrote:
>>>
>>>> In D136395#3875783 <https://reviews.llvm.org/D136395#3875783>, @ayermolo wrote:
>>>>
>>>>> In D136395#3873072 <https://reviews.llvm.org/D136395#3873072>, @dblaikie wrote:
>>>>>
>>>>>> If possible, I'd rather not add this - I think .debug_aranges should be removed (it's already been off-by-default for a decade in Clang) in favor of using CU-level address ranges. They're cheap-enough to parse that it doesn't substantially change the performance of tools so far as I'm aware and they save space by not duplicating the address range information in two places.
>>>>>>
>>>>>> Adding a verifier feels like endorsing/encouraging/maintaining `.debug_aranges` which seems like the wrong direction we should be going.
>>>>>
>>>>> Can you elaborate on why they should be removed. Is it because of the aforementioned duplication of information, or are there other reasons also?
>>>>
>>>> Yeah, basically only that - they're redundant, and maintaining different paths is a burden (verifying them, fixing bugs in them, having tools that either consume one or the other or both, etc) - having a single way to represent things would be better for the DWARF ecosystem of consumers and producers.
>>>>
>>>> (also aranges haven't been updated to benefit from the more compact/fewer-relocation-using encoding of .debug_rnglists introduced in DWARFv5 - so it's bigger/less efficient now as well)
>>>
>>> I see. That makes sense, but unfortunately that option still exists and is being used. We hit an issue where there is inconsistency that I am looking into right now on llvm side.
>>
>> Could you provide more detail about this inconsistency?
>>
>> (one issue I can say is that if you're comparing GCC and Clang `.debug_aranges` you'll see differences - Clang includes global variable addresses in the table and GCC does not)
>>
>>> Thus this patch on verify side.
>>
>> Has it discovered anything interesting so far during development?
>
> Basically what's in this patch.
> error: .debug_aranges[0x0074c110][0] range [0x00000000036d0b0c - 0x00000000036d0b0d) not in compile unit @ 0x1447119f ranges.
> error: .debug_aranges[0x00025d20] compile unit @ 0x002e2f30 line table sequence [5411-5417) has row[5416] with address 0x0000000000d58265 that is not in .debug_aranges nor in compile unit ranges.

Oh, sorry, I meant an example I can build with clang/llvm to reproduce to see what might be going on?

I'd like to see at least one/some examples before we add this verifier check, to get a sense of what sort of things you're seeing/what the most suitable tooling would be to deal with/investigate them.

> So we are seeing this issue with clang built binaries where we have some ranges missing in .debug_aranges, and also some ranges that are missing in the DW_AT_ranges of the DW_TAG_compile_unit.

(similarly, I'd like to see examples so we have something more concrete to consider what this tooling is for/about)



================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp:971-978
+static bool DWARFAddressIsInvalid(uint64_t Addr, unsigned AddrSize) {
+  // Catch any addresses that have been zeroed out, or if they have been set to
+  // -1 or -2 (max PC)
+  if (AddrSize == 4)
+    return Addr == 0 || Addr == UINT32_MAX || Addr == UINT32_MAX-1;
+  else
+    return Addr == 0 || Addr == UINT64_MAX || Addr == UINT64_MAX-1;
----------------
I think we already have a function for this in libDebugInfoDWARF - it's used by the symbolizer to drop tombstoned addresses earlier on/at a lower level probably? (& probably doesn't handle the zero case, since that can be ambiguous)

-2 should only be used in `.debug_loc` (& theoretically in `.debug_range` but I don't think binutils ld uses it there, strangely) - so we probably shouldn't include it here?


================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp:991
+      continue;
+    if (DWARFRange.LowPC < DWARFRange.HighPC)
+      Ranges.insert(AddressRange(DWARFRange.LowPC, DWARFRange.HighPC));
----------------
This is to account for overflow?


================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp:1043-1046
+      // Make sure all .debug_aranges descriptor ranges are found in the
+      // compile unit's ranges.
+      uint32_t DescIdx = 0;
+      for (const auto &Descriptor: ArangeSet.descriptors()) {
----------------
Pretty sure this won't be true for clang/llvm's debug info - LLVM includes global variable addresses in aranges, but not in CU ranges. We've had various discussions about whether this is correct/useful, and so far people seem to think it is correct and maybe useful? I don't really know. GCC doesn't put global variable addresses in either aranges or cu ranges.

(I'm pretty sure it's technically correct, but maybe not all that useful - at least if GCC doesn't do it (pretty sure it's generally correct that CU ranges shouldn't include global variable addresses, at least - so just a question of what should/shouldn't be in aranges))


================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp:1078-1088
+      // Now make sure that all CU ranges are in the .debug_aranges set.
+      for (const auto &Range: SortedCURanges) {
+        if (!SortedAranges.contains(Range)) {
+          errorHeader(error());
+          OS << " compile unit range ["
+              << format_hex(Range.start(), AddrPrintSize)
+              << " - " << format_hex(Range.end(), AddrPrintSize)
----------------
Rather than scanning in both directions (does everything in aranges appear in CU ranges, then does everything in CU ranges appear in aranges) - maybe sort both and do a single scan through & just flag the first place they're inconsistent? (or, if accounting for the "aranges has extra stuff for global variables", skip over those entries if you can until you pick up in the CU ranges - if you get to the end without having visited all the CU ranges then there's something wrong at least)


================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFVerifier.cpp:1093-1130
+      const DWARFDebugLine::LineTable *LT = DCtx.getLineTableForUnit(CU);
+      if (LT) {
+        for (const auto &Sequence: LT->Sequences) {
+          // If the first address in a sequence is invalid, don't verify it as
+          // it should have been dead stripped.
+          const DWARFDebugLine::Row &Row = LT->Rows[Sequence.FirstRowIndex];
+          if (DWARFAddressIsInvalid(Row.Address.Address, SetHdr.AddrSize))
----------------
Do we already have something like this for CU ranges? Could we reuse it?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136395/new/

https://reviews.llvm.org/D136395



More information about the llvm-commits mailing list