[Lldb-commits] [PATCH] D54670: 03/03: .debug_types: Update of D32167 (.debug_types) on top of D51578 (section concatenation)

Wed Feb 20 02:23:18 PST 2019

labath added a comment.

In D54670#1403670 <https://reviews.llvm.org/D54670#1403670>, @jankratochvil wrote:

> In D54670#1403642 <https://reviews.llvm.org/D54670#1403642>, @labath wrote:
>
> > I am still worried about the divergence from llvm's dwarf reader here.
>
>
> I am still going to investigate your `DW_FORM_*` dispatching suggestion <https://reviews.llvm.org/D58330#1400847>.  It looks too simple to work, maybe it is the right way forward.

Thank you for doing that. Please don't feel like I'm taking this all out to you, I know that you are following the direction that has already been set out. My comment is directed at everyone who has interest in the state of lldb's dwarf parser.

And I am fully aware that my proposal is a gross oversimplification, and that doing that might be hard, but I believe the end goal is worth it, and that we can reach it incrementally.

One of the problems we'll probably have to tackle is the encoding of the user_id_t, which is currently very wastefull. Despite it being 64 bits, we can only encode 4GB of debug info. That's still a fairly large number, but it's still not that hard to surpass it. If we slice one bit out of that,  we'll get 2GB, which will be just too small. So we'll either have to increase the size of that type, use a more rational encoding (we already have compile units and dies sitting in a vector, so maybe index into that vector?), or use some concatenation scheme just for the purposes of user_id_t, but then bail out and decompose that very early on.

>> Why not have `get_debug_types_data_extractor()` instead?
> 
> Because DataExtractor accesses always start at offset 0.

Yes, that's part of the idea. :) Starting at zero is the right thing to do here, as that's the base any intra-section references will be using. (and for inter-section refs, you have to manually fiddle with the offsets anyway)

> 
> 
>> BTW, I don't see a `get_debug_info_offset()`. AFAIK, there's no requirement that .debug_info has to come first in an object file. What would happen if the order of the sections in the object file is reversed?
> 
> It would fall back to the section-reading (instead of section-mmapping) less performing fallback in `get_debug_info_data()`.  If there are any such files out there sure the code could be extended to handle both orders of `.debug_info` and `.debug_types` but personally I do not think it is worth the code complication given there is the fallback anyway. I can put a warning there for such case so that it would be easily noticed in the future - but then maybe one could already rather extend the code, any suggestions what to choose?

Hm... ok, I'm glad you've thought about all of the possibilities. Though I would say that this just show that we have an incorrect abstraction here. :)

Of the two options I would say only adding the warning makes sense, since I believe making everything aware of the fact that .debug_info can start at a non-zero offset is equivalent to teaching it that it can start in a different section.

Repository:
  rLLDB LLDB

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D54670/new/

https://reviews.llvm.org/D54670