[Lldb-commits] [PATCH] D73206: Pass `CompileUnit *` along `DWARFDIE` for DWZ

Sun Feb 9 14:54:09 PST 2020

jankratochvil added a comment.

In D73206#1859944 <https://reviews.llvm.org/D73206#1859944>, @labath wrote:

> From a high-level, this approach looks like it could be viable. However, you have taken this a lot further than I would have expected. What I was expecting to see would be that a bunch of the DWARF internal functions would get an extra argument (be it a CompileUnit, DWARFUnit, or whatever), but I did not expect that this would also appear on the "generic" interfaces. My idea was that the external interface would keep getting just a single user_id_t, but then during the process of decoding that integer into a DWARFDie, we would also produce the additional CompileUnit object.

In such case there need to be two kinds of `user_id_t`. One internal to DWARF/ not containing MainCU and one for "generic" interfaces which does contain MainCU. Because internally if `user_id_t` contains MainCU then already also `DWARFDIE` and `DIERef` must contain MainCU - see for example `ElaboratingDIEIterator` calling `DWARFDIE::GetID` where is no chance to know the MainCU (without heavy refactorization to pass it down the call chain) - and the code does not really need to know MainCU, it is just using `user_id_t` as a unique DIE identifier.

> Now, that doesn't mean that adding a CompileUnit argument to these functions too is a bad idea. Quite the opposite -- _if it actually works_, it would be great because it would give us a lot more freedom in encoding the information into the user_id_t value. However, I have some doubts about that. For example, right now we don't create any CompileUnit objects for types  in DWARF type units, and I'm not sure how much we want to change that because of the sheer number of type units.  That means we'd still have to have a way to encode a precise type location in a user_id_t without a CompileUnit, though maybe we would be ablle to assume that a CompileUnit is always available in the DWZ case (?).

Yes, if there is no `CompileUnit` available then DWZ is not used for that type and the `DWARFUnit` itself can be used. `CompileUnit *` can remain `nullptr` when not known.

The new plan:

- `SymbolFile` and `SB*` API should say unchanged with MainCUs encoded into `user_id_t` they use
- types like `Function` do contain `CompileUnit *` but still it may be easier for the APIs to encode MainCU into their `user_id_t`.
- indexes contain now `DIERef`. They should contain rather a new MainCU-extended variant of `DIERef`.
- `DWARFASTParser` (+`DWARFASTParserClang`) API - there must be an additional `CompileUnit *` parameter as it uses `DWARFDIE`. (`DWARFDIE` cannot fit MainCU into its 16 bytes.)
- `DWARFMappedHash` also must have new MainCU field I think.
- `NameToDIE` should be using the new MainCU-enhanced `DIERef`.

Given that `user_id_t` and `DIERef` will exist both with and without MainCU (depending on which code is using them) it would be nice to make a new MainCU-extended inherited type for them so that their mistaken use is compile-time checked. But `user_id_t` is just an integer typedef so it would need to be class-wrapped.  For `user_id_t` one needs to also make a new MainCU-extended type of `UserID`.

> This duality does make me think that we'd be better off in not adding the CompileUnit arguments and encoding everything into a user_id_t. However, I am not sure if we really have a choice here -- it may be that this information is so complex that it will not be possible to reasonably encode it into 64 bits.

That my DWZ branch (`git clone -b dwz git://git.jankratochvil.net/lldb`) already encodes MainCU to all of `DWARFDIE` (becoming 24B instead of 16B), `DIERef` (staying 8B) and `user_id_t` (staying 8B, reducing DWO unit number from 31 bits to 29 bits). One can encode MainCU into the DWO field as both can never exist together.

> PS. It would be easier to evaluate this if this was based on master instead of some patch which is taking the source in the direction we don't want to go it.

I need to verify the new refactorization satisfies all the requirements of my existing functional non-regressing DWZ implementation I have. I would then have to create a second proof of concept patch being to post here based on `master`. Which I hope is not required for your review before its real implementation. I understand the final refactorization patch will be based on `master` and DWZ will be rebased on top of it.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73206/new/

https://reviews.llvm.org/D73206