[PATCH] D152947: [DebugInfo] Change DWARFDebugAbbrev initialization

Fri Jun 16 11:56:57 PDT 2023

bulbazord added a comment.

In D152947#4429019 <https://reviews.llvm.org/D152947#4429019>, @dblaikie wrote:

>>> I was trying to avoid parsing more than one /element/ of a DeclSet (assuming a DeclSet is all the abbreviations for one CU ~roughly) - to symbolize you basically 'need' to parse all the CU DIEs to collect all the ranges so you can do a quick search over ranges - which means if you have to parse whole DeclSets then the symbolizer won't be able to improve here (it's not critical, by any means - I think the symbolizer's still fast enough, but I was hoping to bring it up in case it resonated with you/anyone, if only to keep it in mind as a future possibility, even if not solving it right now).
>>
>> It definitely resonates with me, my colleagues and I do care a lot about the performance of the DWARF parser in LLVM and I think we'd like to be able to only do the necessary work in parsing. I don't think anything about the design I'm working towards will prevent us from being able to add "lazy" functionality, though we might have to think about what methods we expose to support that functionality.
>
> Yeah, I don't know that it'd prevent/make it especially harder, but maybe there's ways to steer more towards that lazy direction - it seems like this change is cementing a little more the idea of up-front parsing, where maybe it's worth considering going further away from that.

This change in particular enforces the idea that if you create a `DWARFDebugAbbrev` object, you should supply a `DataExtractor` instead of creating one and supplying one later via the `extract` method. You don't have to actually parse anything upon instantiation. The next step I was going to do here is to remove the call to `parse()` in the `begin()` method and add an assertion that enforces users call `parse()` before iterating over a `DWARFDebugAbbrev` object. If you want to iterate over one, presumably you wanted to actually parse the whole thing (at least that's what the current behavior is. We could expose a lazy version of the iterator that doesn't enforce this though). There is still `DWARFDebugAbbrev::getAbbreviationDeclarationSet` which will get you just one set without parsing the entire section (and will not have the same assertion so that we can be lazy). Further up you said you wanted to avoid parsing more than just one `Decl` in a `DeclSet`, and I would assume that is the function that you would want to make "lazy" -- Presumably you know what the `CUAbbrOffset` would be from the `CU` header (or the index in the DWP case). It would be a matter of adding a lazy variant of this method which only parses the AbbreviationDeclarations that you ask it to (but still does error checking as needed). If it makes sense to be completely lazy, we can remove the support for being eager entirely. I just want us to be able to go fast and to handle errors instead of dropping malformed DWARF on the floor.

This is roughly how I imagined it working (which is why I feel our goals are not opposed). What do you think?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152947/new/

https://reviews.llvm.org/D152947