[PATCH] D78950: Adds LRU caching of compile units in DWARFContext.

Wed Apr 29 11:49:16 PDT 2020

netforce added a comment.

In D78950#2009668 <https://reviews.llvm.org/D78950#2009668>, @labath wrote:

> (adding more debug info folks)
>
> In D78950#2008973 <https://reviews.llvm.org/D78950#2008973>, @netforce wrote:
>
> > Do you think this is better in terms of keeping the API flexible enough?
>
>
> Umm... probably?
>
> I don't have a comprehensive overview of the users of the DWARF parser either, but I'm sure there are users that want to access it through the lower level APIs. Lldb will most likely be one of those users -- even after it starts using llvm parsers completely I think it's fairly unlikely it would use APIs like `getLocalsForAddress`, but rather do something custom. However, all of that is vapourware, so its hard to reason about that.
>
> It may be more interesting to look at other (real) users. One such user which comes to mind is (llvm-)dsymutil, as it accesses dwarf in a fairly complex way. Jonas, what do you think?

I took a look at llvm-symbolizer code considering how to fit the caching there, and I also took a brief look at dsymutil to see how it uses the DWARF library.

llvm-symbolizer (more specifically code in lib/DebugInfo/Symbolize) mostly access the library through DIContext interface, and it can be either DWARFContext or PDBContext <https://github.com/llvm/llvm-project/blob/4632b7292a8a841897f0e887ab5f87b18b7822e4/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp#L548-L568>, while the caching is quite DWARFContext-specific.

dsymutil creates DWARFContext and give it to DWARFLinker library <https://github.com/llvm/llvm-project/blob/5a1d9c0f5ac85b486a539b1d9558205821f17b33/llvm/tools/dsymutil/DwarfLinkerForBinary.cpp#L273-L278>, then it seems it uses lower level API (DWARFUnit) to access the information. Here's an example <https://github.com/llvm/llvm-project/blob/5a1d9c0f5ac85b486a539b1d9558205821f17b33/llvm/lib/DWARFLinker/DWARFLinker.cpp#L2010-L2055>.

So, both high level API (DIContext) and low level API (DWARFUnit) are being used depending on the the users, and it makes it difficult to decide where the best place to put the caching in is. Here are a few options I could think of.

1. Place caching in DWARFContext (as the revision currently is) but turn it off by default.

In this way, llvm-symbolizer could benefit from the revision without making its client code too complicated. At the same time, it'll be noop for other users that access the library with lower level API. If they need caching later, they can choose between implementing it using the lower level API or using the one in DWARFContext.

2. Move caching to llvm-symbolizer

This minimizes the disruption in the API (both DWARFContext and DWARFUnit). However, I expect that it needs pretty messy changes in llvm-symbolizer because it uses DIContext interface: we need to make it distinguish the implementation of DIContext and apply caching only when it's DWARFContext.

3. Just expose the necessary member functions without implementing caching anywhere in LLVM source tree

This also needs minimum disruption in the API (as is in 2), and it's good enough for our own use case (our own symbolization tool that access the library via both DWARFContext and DWARFUnit). However, none of LLVM tool will benefit from caching until they implement it.

I'm good with all three options, but I'm personally leaning toward option 1. Which one should we follow for the sake of the API? Or is there any other way? Please let me know. Thank you!

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78950/new/

https://reviews.llvm.org/D78950