[Lldb-commits] [PATCH] D68655: Trust the arange accelerator tables in dSYMs

Fri Jan 13 14:15:26 PST 2023

clayborg added a comment.

In D68655#4045873 <https://reviews.llvm.org/D68655#4045873>, @jasonmolenda wrote:

> I know this is all moot because the dSYM-specific patch landed, but I am curious about this part,
>
> In D68655#4045561 <https://reviews.llvm.org/D68655#4045561>, @clayborg wrote:
>
>> 
>
>
>
>> Different things are included in DW_AT_ranges, like address ranges for global and static variables. .debug_aranges only has functions, no globals or statics, so if you are trying to find a global variable by address, you can't rely on .debug_aranges. Nothing in the DWARF spec states things clearly enough for different compilers to know what to include in .debug_aranges  and the compiler uint DW_AT_ranges.
>
> The standard says,
>
> "Each descriptor is a triple consisting of a segment selector, the beginning address within that segment of a range of text or data covered by some entry owned by the corresponding compilation unit, followed by the non-zero length of that range"
>
> It is pretty clear on the point that any part of the address space that can be attributed to a compile_unit can be included in the debug_aranges range list - if only code is included, that's a choice of the aranges writer.  lldb itself, if debug_aranges is missing or doesn't include a CU, steps through the line table concatenating addresses for all the line entries in the CU (v. DWARFCompileUnit::BuildAddressRangeTable ) - it doesn't include data.  (it will also use DW_AT_ranges from the compile_unit but I worry more about this being copied verbatim from the .o file into the linked DWARF than I worry about debug_aranges, personally)

Sorry, looks like the spec is pretty clear. From what I remember David Blakie saying, clang won't put globals or static (no data, just code) into the .debug_aranges. GCC might do this differently. I thought that LLDB would try to use the following in order:

- .debug_aranges for a CU if available
- fall back to DW_AT_ranges
- fall back to line tables only if DW_AT_ranges is not there

> In a DW_TAG_compile_unit, the DW_AT_ranges that the compiler puts in the .o file isn't relevant to the final linked debug information, unless it included the discrete range of every item which might be linked into the final binary, and the DWARF linker only copies the range entries that were in the final binary in the linked dwarf DW_AT_ranges range list.   Or if the dwarf linker knows how to scan the DWARF compile_unit like lldb does, concatenating line table entries or DW_TAG_subprogram's.

The Darwin workflow has a smart DWARF linker, so yes, it will regenerate only what is needed for the final output in dsymutil. The compiler have been trained to not emit .debug_aranges for Darwin triples because we know we don't need them since dsymutil will regenerate from the final output only what is needed.

All other platforms use standard linker technology where they will concatenate all DWARF sections to create the final DWARF sections and then apply relocations. So if you have a DW_AT_ranges in a CU that has 100 addresses in a .o file, and the final output only uses 10 functions and dead strips 90 functions, the final .debug_aranges for that CU will still contain ranges for all functions, 10 of which will have had relocations applied and will have valid address ranges, and 90 others will start at the sentinel addresss of zero or -1 to indicate they should have been dead stripped, but there can be tons of wasted data that isn't needed in each .debug_aranges CU data.

> If any dwarf linker is doing all of this work to construct an accurate & comprehensive compile_unit DW_AT_ranges, why couldn't it be trusted to also have an identical/accurate dwarf_aranges for this cu?  I don't see the difference.

Only dsymutil does this as this is the only smart DWARF linker at the moment! See above comment as to why. We do have llvm-dwarfutil now for ELF files that can optimize DWARF after it has been linked to remove all of this extra junk and unique types. llvm-dwarfutil uses the same DWARF linking engine as dsymutil. But all other systems are concatenate all sections, then apply relocations, and anything that doesn't have a relocation gets a sentinel address applied as the relocation.

In D68655#4045895 <https://reviews.llvm.org/D68655#4045895>, @jasonmolenda wrote:

> I guess to say it shorter.  If I have a dwarf_aranges, that means the dwarf linker created it.  And if it created it, surely its at least based off of the subprogram address ranges or the line table -- that is, the text address ranges.

True for Darwin and dsymutil yes.

> If I have a DW_TAG_compile_unit DW_AT_ranges, either the compiler (to the .o file) created it, in which case I really am suspicious of those ranges because the compiler can't know which symbols will end up in the final executable, and the addresses in the ranges were simply translated to the final executable address equivalents.  Or it was rewritten by a dwarf linker that parsed the DWARF and knew how to correctly calculate the addresses that correspond to that compile unit.

This still works in all cases as relocations will be applied to each range. If we are doing DWARF in .o files, then we know to check each range to see if it ended up in the debug map or not and we throw out the ones that weren't.

> if anything, I would trust a dwarf_aranges entry for a CU before I would trust the CU's DW_AT_ranges list.  Both have to be written by the dwarf linker to be correct, but only the former is written ONLY by the dwarf linker.

It would be great if the compilers actually followed the DWARF spec, but alas they don't do this well for .debug_aranges. I know clang and GCC do things differently for these sections.

Basically the contents of the .debug_aranges and the DW_AT_ranges can and should be the same. It is kind of a waste of space. It would be nice if the .debug_aranges could be modified to say "use this offset in the .debug_ranges section" so it would re-use the data and allow for quicker access when doing address lookups. I don't like having to parse ANY DWARF DIEs, like the CU DIE, just to get to the DW_AT_ranges, so it would be nice to be able to access this from .debug_aranges but be able to share the data between the two references. I can't remember how a DW_TAG_compile_unit defines what should be included in the DW_AT_ranges... I am not sure if the DWARF spec says that data should be included in these ranges or not.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68655/new/

https://reviews.llvm.org/D68655