[Lldb-commits] [PATCH] D78489: [lldb/DWARF] Trust CU DW_AT_low/high_pc information when building address tables

Mon Apr 20 22:46:27 PDT 2020

clayborg added a comment.

So it looks like there are bugs in the "llvm-dwarfdump --verify". The blurb I posted above clearly has the function's address range contained in it, sorry for the false alarm. I have been quite a few problems with DWARF with LTO and other post production linkers. The llvm-dwarfdump might be assuming that the address ranges in DW_AT_ranges are sorted. I will work on a fix for this if my llvm-dwarfdump was from top of tree.

I am worried about performance with this patch. Prior to this we would:
1 - check for DW_AT_ranges, and use that if present
2 - go through all DWARF and look for DW_TAG_subprogram DIEs with valid DW_AT_low_pc/DW_AT_high_pc attributes
3 - _only_ parse the line tables as a last resort

This seems like the right way to go to get max performance right?

With this patch we:
1 - check for DW_AT_ranges, and use that if present
2 - _always_ parse the line tables and try to glean address range information from this

This seems like it will be slower, though I have not benchmarked this. Should be easy to test with a large binary and just comment out the code that checks for DW_AT_ranges.

One thing to think about that my flawed example above did show: dead stripped functions cause problems for debug info. Both in DW_AT_ranges on the compile unit and in all of the DWARF below this. We might want to check any and all address ranges to ensure that they are in sections that read + execute permissions and if they don't throw them out. It is easy to come up with a minimal address range from the main object file prior to parsing any address information and use that to weed out the bad entries. If functions have had their address set to zero, then we will usually correctly weed these out as zero is often part of an image base load address, but not in a section (not program header or top level section in LLDB's ObjectFileELF, but in a section header section) with r+x permissions. It would be nice to be able to weed these addresses out so they don't cause problems possibly as a follow up patch.

So I will be ok with this patch if we can verify that line table parsing is faster than checking DIEs for DW_AT_high/low pc attributes. If there is a regression, we should keep the code as is.

Sorry for the false alarms. That'll teach me to check patches at the end of a long day...

================
Comment at: lldb/source/Plugins/SymbolFile/DWARF/DWARFCompileUnit.cpp:58-67
-  // We don't have a DW_AT_ranges attribute, so we need to parse the DWARF
-
-  // If the DIEs weren't parsed, then we don't want all dies for all compile
-  // units to stay loaded when they weren't needed. So we can end up parsing
-  // the DWARF and then throwing them all away to keep memory usage down.
-  ScopedExtractDIEs clear_dies(ExtractDIEsScoped());
-
----------------
So this is clearly much more efficient than parsing all the line tables for all functions right? Why would we stop doing this? With the old code, if we didn't have DW_AT_ranges, and we did have DW_TAG_subprogram DIEs with valid address ranges, then we would not go through all of line tables. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78489/new/

https://reviews.llvm.org/D78489