[lldb-dev] Resolving dynamic type based on RTTI fails in case of type names inequality in DWARF and mangled symbols

Thu Dec 21 08:22:04 PST 2017

> On Dec 21, 2017, at 2:45 AM, Pavel Labath <labath at google.com> wrote:
> 
> On 20 December 2017 at 18:40, Greg Clayton <clayborg at gmail.com <mailto:clayborg at gmail.com>> wrote:
>> 
>>> On Dec 20, 2017, at 3:33 AM, Pavel Labath <labath at google.com> wrote:
>>> 
>>> On 19 December 2017 at 17:39, Greg Clayton via lldb-dev
>>> <lldb-dev at lists.llvm.org> wrote:
>>>> The apple accelerator tables are only enabled for Darwin target, but there
>>>> is nothing to say we couldn't enable these for other targets in ELF files.
>>>> It would be a quick way to gauge the performance improvement that these
>>>> accelerator tables provide for linux.
>>> 
>>> I was actually experimenting with this last month. Unfortunately, I've
>>> learned that the situation is not as simple as flipping a switch in
>>> the compiler. In fact, there is no switch to flip as clang will
>>> already emit the apple tables if you pass -glldb. However, the
>>> resulting tables will be unusable due to the differences in how dwarf
>>> is linked on elf vs mach-o. In elf, we have the linker concatenate the
>>> debug info into the final executable/shared library, which it will
>>> also happily do for the .apple_*** sections.
>> 
>> That ruins the whole idea of the accelerator tables if they are concatenated...
> 
> I'm not sure I'm convinced by that. I mean, obviously it's better if
> you have just a single table to look up, but even if you have multiple
> tables, looking up into each one may be faster that indexing the full
> debug info yourself. Take liblldb for example. It has ~3000 compile
> units and nearly 2GB of debug info. I don't have any solid data on
> this (and it would certainly be interesting to make this experiment),
> but I expect that doing 3000 hash lookups (which are basically just
> array accesses) would be faster than indexing 2GB of dwarf (where you
> have to deal with variable-sized fields and uleb encodings...). And
> there is always the possibility to do the lookups in parallel or merge
> the individual tables inside the debugger.

The main idea is to touch as few pages as possible when doing searches. We effectively have this scenario right now with Apple DWARF in .o file debugging. So much time is spent paging in each accelerator table that we have very long delays starting up large apps. This would be more localized, but there would be a similar issue. Concatenation would be fine for now if we make it work, but for long term archival, the real solution is to merge the tables. 
>> 
>>> The second, more subtle problem I see is that these tables are an
>>> all-or-nothing event. If we see an accelerator table, we assume it is
>>> an index of the entire module, but that's not likely to be the case,
>>> especially in the early days of this feature's uptake. You will have
>>> people feeding the linkers with output from different compilers, some
>>> of which will produce these tables, and some not. Then the users will
>>> be surprised that the debugger is ignoring some of their symbols.
>> 
>> I think it is best to auto generate the tables from the DWARF directly after it has all been linked. Skip teaching the linker about merging it, just teach it to generate it.
> 
> If the linker does the full generation, then how is that any better
> than doing the indexing in the debugger?

It would be better in that debugging the same thing twice would be super quick. 

> Somebody still has to parse
> the entire dwarf, so it might as well be the debugger. I think the
> main advantage of doing it in the compiler is that the compiler
> already has all the data about what should go into the index ready, so
> it can just build it as it goes about writing out the object file.

This is kind of why I would really like to see the "llvm-dsymutil --update" work, in case the compiler has bugs where it doesn't generate things correctly. The question is how much time does it cost the compiler to generate vs we generate it in the linker or post linking. 

> Then, the merging should be a relatively simple and fast operation
> (and the linker does not even have to know how to parse dwarf). Isn't
> this how the darwin workflow works already?

Sure is easier on the linker. But as I stated above, paging in many tables is really slow for thousands of object files with the MacOS DWARF in .o files with a link map in the main executable.

> 
>>> This is probably a bit more work than just "flipping a switch", but I
>>> hope it will not be too much work. The layout and contents of the
>>> tables are generally the same, so I am hoping most of the compiler
>>> code for the apple tables can be reused for the dwarf5 tables. If
>>> things turn out they way I want them to, I'll be able to work on
>>> getting this done next year.
>> 
>> Modifying llvm-dsymutil to handle ELF so we can use "llvm-dsymutil --update foo.elf" is the quickest way that doesn't involve modifying anything but llvm-dsymutil. It will generate the accelerator tables manually and add/modify the existing accelerator tables and write out the new elf file that is all fixed up. I would suggest going this route at first to see what performance improvements we will see with linux so that can drive how quickly we need to adopt this.
>> 
> 
> I'm not sure now whether you're suggesting to use the dsymutil
> approach just to gauge the potential speedup we can obtain and get
> people interested, or as a productized solution. If it's the first one
> then I fully agree with you. Although I think I can see an even
> simpler way to estimate the speedup: build lldb for mac with apple
> indexes disabled and compare its performance to a vanilla one. I'm
> going to see if I can get some numbers on this today.

You are correct in that I want to gauge the potential speedup with llvm-dsymutil so we know how much effort we should put into this on the linux and other platform side. The nice thing about the llvm-dsymutil approach is it allows anyone to try it out on their system. You are correct that we can disable the accelerator tables on Darwin by doing full build of clang with debug info and doing a few startup tests. The results were huge for us when we did that: 2 minutes without accelerator tables, under 5 seconds with them.

Greg

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20171221/cc3f66ac/attachment-0001.html>