[Lldb-commits] [PATCH] D73206: Pass `CompileUnit *` along `DWARFDIE` for DWZ

Wed Feb 12 04:23:26 PST 2020

labath added a comment.

In D73206#1866280 <https://reviews.llvm.org/D73206#1866280>, @jankratochvil wrote:

> In D73206#1859944 <https://reviews.llvm.org/D73206#1859944>, @labath wrote:
>
> > From a high-level, this approach looks like it could be viable. However, you have taken this a lot further than I would have expected. What I was expecting to see would be that a bunch of the DWARF internal functions would get an extra argument (be it a CompileUnit, DWARFUnit, or whatever), but I did not expect that this would also appear on the "generic" interfaces. My idea was that the external interface would keep getting just a single user_id_t, but then during the process of decoding that integer into a DWARFDie, we would also produce the additional CompileUnit object.
>
>
> In such case there need to be two kinds of `user_id_t`. One internal to DWARF/ not containing MainCU and one for "generic" interfaces which does contain MainCU. Because internally if `user_id_t` contains MainCU then already also `DWARFDIE` and `DIERef` must contain MainCU - see for example `ElaboratingDIEIterator` calling `DWARFDIE::GetID` where is no chance to know the MainCU (without heavy refactorization to pass it down the call chain) - and the code does not really need to know MainCU, it is just using `user_id_t` as a unique DIE identifier.

Well.. I think we should avoid using user_id_t in places where we don't have enough information to construct one. I have removed the usage in ElaboratingDIEIterator with 034c2c6771d318f962bbf001f00ee11e542e1180 <https://reviews.llvm.org/rG034c2c6771d318f962bbf001f00ee11e542e1180>.

> ...
>  The new plan:
> 
> - `SymbolFile` and `SB*` API should say unchanged with MainCUs encoded into `user_id_t` they use
> - types like `Function` do contain `CompileUnit *` but still it may be easier for the APIs to encode MainCU into their `user_id_t`.

SG, I agree that would be better, since it requires less refactoring throughout.

> - indexes contain now `DIERef`. They should contain rather a new MainCU-extended variant of `DIERef`.

The indexes need to *produce* an "extended" DIERef. Internally, they can contain whatever they want. This is particularly important for the manual index, where simply introducing a new field would be a pretty big memory hit, I think. However, it should be fairly easy to design a more compact encoding scheme specifically for the use in the manual index.

In fact, if we make the index functions produce results piecemeal rather than in one big chunk, then we could make them return DWARFDIEs (and compile units or whatever) directly. The reason these functions return DIERefs now is that DIERefs are cheap to construct, and we don't want to construct all DWARFDIEs if the user only cares about one result. If the user can abort iteration, then we can get rid of DIERefs in these functions, as the first thing that the user does to them anyway is to convert them to a DWARFDIE.

> - `DWARFASTParser` (+`DWARFASTParserClang`) API - there must be an additional `CompileUnit *` parameter as it uses `DWARFDIE`. (`DWARFDIE` cannot fit MainCU into its 16 bytes.)

Yep. It may be interesting to try this out on regular dwo first. Right now, dwo dodges this problem by storing a CompileUnit pointer on both skeleton and split units. If we can make this work without the split unit back pointer, then we can come move closer to how things work in llvm, and also pave the way for the dwz stuff.

> - `DWARFMappedHash` also must have new MainCU field I think.

This is only used in conjunction with apple accelerators. I don't think dwz will ever be a thing there, so you can just ignore dwz here (type units and dwo are also ignored here).

> - `NameToDIE` should be using the new MainCU-enhanced `DIERef`.

This is only used in the manual index, where we can do something custom if needed.

> Given that `user_id_t` and `DIERef` will exist both with and without MainCU (depending on which code is using them) it would be nice to make a new MainCU-extended inherited type for them so that their mistaken use is compile-time checked. But `user_id_t` is just an integer typedef so it would need to be class-wrapped.  For `user_id_t` one needs to also make a new MainCU-extended type of `UserID`.

I think it would be good to have only one kind of "user id". What are the cases where you need a main-cu-less user id?

>> PS. It would be easier to evaluate this if this was based on master instead of some patch which is taking the source in the direction we don't want to go it.
> 
> I need to verify the new refactorization satisfies all the requirements of my existing functional non-regressing DWZ implementation I have. I would then have to create a second proof of concept patch being to post here based on `master`. Which I hope is not required for your review before its real implementation. I understand the final refactorization patch will be based on `master` and DWZ will be rebased on top of it.

This is enough for to get a very high level idea of where you're going, but it gets tricky when I want to look at some specific details. For instance, some of the changes in this patch seem trivial and non-controversial, but that's only because they are based on changes in the other patch, and those are the changes I had problems with..

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73206/new/

https://reviews.llvm.org/D73206