[Lldb-commits] [lldb] [lldb] Fix a crash when using .dwp files and make type lookup reliable with the index cache (PR #79544)

Tue Jan 30 18:06:46 PST 2024

clayborg wrote:

> > Even if we want to have the skeleton compile unit be parsed first, we would need to know this from any accelerator table entry, wether it be from .debug_names or from our internal manual index.
> 
> True enough, but I think letting this become a lazy property post-construction is a bit more problematic as it might lead to more complicated invariants/situations further downstream. By making it a precondition that any split unit, on construction, has the association with the skeleton unit - there would be fewer "interesting" situations later where the lookup occurs when someone isn't expecting it. The split unit isn't useful without the skeleton for addresses, etc, so it's not like delaying the skeleton lookup provides better performance, etc.

It actually does provide better performance as we will often do the type lookup solely in the .dwp file and determine we don't need to actually parse and return the type because the context doesn't match. There is no need for the skeleton unit and the address tables and lines tables in the type lookup case. It allows us to traverse the DIE tree in the .dwp file without ever needing to possibly do an expensive lookup by DWO ID for DWARF4, when it actually isn't needed. We have an accessor to allow us to get the skeleton unit if and only if we need it, and yes, most times it will be filled in already. But when it isn't we can easily access it.

> > Our internal manual index creates a DIERef object with a magic .dwp file index + a DIE offset within the .debug_info.dwo in the dwp file.
> 
> (side note: it'd be great if the internal manual index was closer to .debug_names (especially the serialized version - it'd be good if that /was/ .debug_names) - to ensure consistency (& at least last I checked the cached manual index - the debugger startup time was impacted by loading that index off-disk, when the .debug_names/.apple_names indexes don't need to be loaded from disk) & fewer support paths/less code to maintain)

It would be really nice to use .debug_names and I welcome a patch that implements this. I don't have time for this at the moment, but it would be great if anyone else has some time to make this happen.

> Presumably the entry would still need to mention which CU in the .dwp file it comes from - and presumably the non-split entries in this index would also mention which CU they come from (by offset in the .debug_info section)? (because reading a DIE without knowing wihch CU it's in isn't helpful - you need CU properties, etc)
> 
> (but I'm not an lldb maintainer, so can take all this with a few grains of salt)

It is a great idea, and it was my original thought when I was saving the index to disk. The APIs for generating the .debug_names were hard to adapt over into the LLDB side of things, and I didn't want to have the LLVM DWARF parser have to parse the DWARF in order to get that to work, so I opted to not implement it that way to make sure we didn't double parse the DWARF. At some point if we look at the performance of the LLVM DWARF parser (.debug_info parser) and it performs as well or better than LLDB's, we can switch over to it. I am also worried about asserts and the coding style differences where invariants cause crashes in the LLVM parser vs LLDB's parser being more accepting of bad input. 

We do use the LLVM .debug_line parser already and ran into such a crasher last week when a BOLT generated binary had an invalid DW_AT_stmt_list offset and it is crashing LLDB because of this kind of an assert followed by a crash (when  asserts aren't enabled it crashes).

https://github.com/llvm/llvm-project/pull/79544