[Lldb-commits] [PATCH] D58330: 01/03: new SectionPart for Section subranges (for effective .debug_types concatenation)

Mon Feb 18 02:06:51 PST 2019

labath added a comment.

In D58330#1400825 <https://reviews.llvm.org/D58330#1400825>, @jankratochvil wrote:

> In D58330#1400812 <https://reviews.llvm.org/D58330#1400812>, @labath wrote:
>
> > It also appears that we will still end up mmapping the object file, and then copying all of the debug info out of it into a heap buffer.
>
>
> In all usual cases at least on Linux LLDB should use just the mmap part, without any copying - see the block on line 691 of `SymbolFileDWARF::get_debug_info_data()`: https://reviews.llvm.org/D51578#C1361566NL682
>
> The question is how hard to optimize the corner cases of `IsInMemory()`, `SHF_COMPRESSED` with `.debug_types` etc.  Originally I did not optimize it but I think this is what @clayborg did not like: https://reviews.llvm.org/D51578#1383458

Ok, I see what you mean. It's nice that we don't have this regression in the base case. Though I guess we will still do the copying as soon as debug_types sections appear. From one POV, that's fine, because it's a strict improvement over just not supporting debug_types at all. However, the fact that will have to do that in some cases anyway makes me think whether we have the right abstraction here.

> 
> 
>> What's the reason we're trying so hard to concatenate things? IIRC, it was because it makes things appear DWARF5-like, but this is now creating a lot of infrastructure that will be completely unused in the dwarf5 case, so I think we're missing that goal.
> 
> In part yes, I agree. I cannot much agree with the "//completely unused in the dwarf5 case//" as for DWZ I need the same so if we do not implement it for `.debug_types` I need to implement it for DWZ anyway. Which is why I also do that - to justify some way such refactorization for DWZ. Red Hat does not use `.debug_types` and DWARF-5 does not have this problem anyway as you say.
> 
> Currently DWZ solves it on line 164 - DWZRedirect(): https://reviews.llvm.org/D40474#C1183882NL164
>  Although there the two sections to be merged (main `.debug_info` and `.debug_info` from DWZ common file - from `/usr/lib/debug/.dwz/` directory) come from two different files.  I haven't coded that yet.
>  Currently DWZ also needs D40473 <https://reviews.llvm.org/D40473> full of `GetMainDWARF()` vs. original `GetDWARF()` and I hope I could drop this by also merging `.debug_abbrev` (and maybe some others).  But I haven't yet tried to rebase the DWZ patchset on top of this merging patchset if there isn't some catch.
> 
> From the higher point of view: `.debug_types` is not needed thanks to DWARF-5 and Red Hat could replace DWZ with `-fdebug-types-section` or at least just not to use DWZ `-m|--multifile` option (the DWZ common files so we would stay at just one file). But then we live in a real world and people want to debug existing code built with `.debug_types` and RHELs deployed out there do use DWZ with `-m|--multifile`. And both do work in GDB now.

I am not suggesting we shouldn't implement debug_types _at all_ because it is "solved" by DWARF5. However, i am questioning the "concatenation is the best vehicle to implement this support because it makes things similar to DWARF5" line of reasoning, because it seems to me that we will be greatly diverging from the DWARF5 code path anyway.

> 
> 
>> Can't we just admit that we are dealing with two sections here, and use one bit from the user_id_t (or whereever it's needed) to tell which one are we talking about?
> 
> Been there, done that - with `offset_t` (although by offseting it and not by one bit but that is similar) - the `FORWARDER()` dispatching of `DWARFDataExtractor`: https://reviews.llvm.org/D51578?id=170342#C1214469NL1
>  The problem is that it has some percents of performance impact which @clayborg did not like.

The forwarding implementation is the closest to what I have in mind, but it is not exactly that. I am imagining this to work at a slightly different level. Instead of changing the DataExtractor class at all, i'd have a new object, let's call it DataRegistry for the purposes of this explanation. Then when you want to look up some, which can potentially reside in another chunk of data (this should always be clear from the context, because DWARF is not meant to be consumed as concatenated) you do something like:

  switch(form) {
  case DW_FORM_which_refers_to_this_compile_unit:
    return {current_data_extractor, form_value};
  case DW_FORM_which_refers_to_stuff_that_may_live_elsewhere:
    return {data_registry.lookup_the_data_extractor_for_the_potentially_external_entity(form_value), figure_out_the_offset_in_that_entity(form_value)};
  }

The registry should contain the code necessary to produce the right section given the information known to the caller (I guess these would be things like the dwarf form, type unit signature, whatever info DWZs use to locate the external data, etc.)

The user_id_t comes into play when you need to "bookmark" a reference to a certain DIE, so that you come back to it later. In this case the value couldn't just be an offset into the debug_info section, but it would also have to contain some breadcrumbs so that you can track down the appropriate section too.

> I think `user_id_t` is too high for this functionality as all the inter-DWARF references already use only `offset_t` inside the DWARF code.

So how do these offset_t cross-references work? (My apologies, I have successfully blocked out of my memory the inner workings of SymbolFileDWARF.) I assume there has to be a place where you take something which refers to a DIE in a different section, and then add the magic offset to account for the concatenation, because the magic offset will not be present in dwarf. These are the places I would hook into so that instead of a bare offset, they additionally pass a something which identifies the place where that data is supposed to be read from. I think this be the solution most understandable to people looking at the code in the future, as it work the way that dwarf was intended to be parsed, and would be at least rougly similar to the llvm implementation, which doesn't do any kind of concatenation. That alone would be worth a 5% performance hit to me, particularly as I think the hit could be reclaimed by fine-tuning how these translations are made.

I am sorry that we're driving you around like this. I understand it must be extremely frustrating to be working on this feature for so long. I just don't want to add to the mountain of technical debt that we already have when it comes to SymbolFileDWARF, without exploring all possibilities.

Repository:
  rLLDB LLDB

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D58330/new/

https://reviews.llvm.org/D58330