[lldb-dev] New DWARF parser proposal

Fri Aug 30 13:27:18 PDT 2013

DWARF parsing is currently very slow. On my machine, loading the 'clang' 
binary into lldb takes 14 seconds (vs. gdb's 29 seconds). The actual I/O 
cost of reading that much data is only around 2 seconds.

The DWARF parser has to read the entire .debug_info section into memory 
and parse it. It would be great if we didn't have to do this. OS X has 
the .apple_names section et al, which allow lldb to automatically have 
an index without having to parse anything.

However, this is an Apple extension and does not exist on other 
platforms. There are a bunch of accelerator tables that the DWARF spec 
allows for, but they're all unusable. pubnames is either absent or 
incomplete, aranges might not be present, and if they were generated by 
the compiler rather than the linker, then they may not cover all the 
object files in the binary (causing lldb to miss symbols). So we end up 
in the situation we have now, where we cannot use or trust the 
accelerators, and have to parse everything anyway to build our own index.

I believe lldb does this the wrong way. Performing just a simple "b 
main" test, it will touch the entire debug_info section *4 times* (at 
least), which on the clang binary example is 600MB of data each time:

- pass 1 (extract DIEs)
- pass 2 (index DIEs)
- pass 3 (extract DIEs)
- pass 4 (build arange tables)

I believe the key problems of the current design are:
     1) lldb tries to build it's own DIE array copy, rather than just 
referring to the existing data in-place. This adds a significant 
management overhead to all functions accessing this data.
     2) lldb goes to great efforts to avoid reading the entire debug 
information (even though it will ultimately need to anyway) and to avoid 
keeping it in memory. This in fact causes it to *reload* it several 
times, as each further operation performs lazy initialization and causes 
a re-parse.

If we just accepted that we are forced to load all the data once, it 
would actually be faster. My suggestion therefore is to write an 
optimized single-pass DWARF indexer to replace the current DWARF loader, 
with the following properties:

- Always read debug info exactly once upon module load (unless we can 
guarantee apple extensions are used).
- Use the entire debug_info section in-place, without trying to build a 
copy. Not having separate stages for extraction and indexing will allow 
efficient data traversal.
- Make use of the abbreviation tables to pre-build decoder structures 
for each DWARF tag type. In most cases we can know the size of each DIE 
the moment we read it's abbreviation code in, and can skip in one 
operation if needed without having to parse the elements. Because we run 
in one pass, we never have to even look at DIEs we don't need.
- Track the parent scope as we go, on a stack, so we don't have to keep 
doing lookups which walk the DIE tree. The current parser walks up the 
tree to find what scope it's in, even though it already parsed the 
parent scope container.
- Build arange tables automatically as we go, ignoring any that might 
already be present. We have already touched and extracted the range data 
anyway, it would be trivial to build an accelerator table for free.
- For strings, we should pre-pool the DWARF string table once up-front, 
to avoid repeatedly pooling strings for each DIE.

With this approach we use the DIE data as-is in memory, without having 
to make our own copy.
Parent chains should ideally only be used during parsing. If 
parents/siblings are really needed after the initial parse, one easy 
solution would be to just store that in a separate hash table.

I welcome discussion on this. I think it's important for lldb to not 
have any delays on loading programs, and as we cannot control what the 
compilers will supply to us, we have to address this on our end.

-- 
Richard Mitton
richard at codersnotes.com