[lldb-dev] Resolving dynamic type based on RTTI fails in case of type names inequality in DWARF and mangled symbols

Tue Dec 19 12:33:00 PST 2017

Tamas, Greg, thank you, I got the idea how it should work without 
accelerator tables, but I still cannot figure out how to use/update the 
existing accelerator tables. So let me walk trough it once again:
   1. It is necessary to perform lookup by mangled name (as all we 
initially have is mangled "vtable for ClassName"-symbol).
   2. All the existing apple accelerator tables (e.g. apple_types) have 
demangled and unqualified names as a key.
   3. It is not always possible to get the original demanled type name 
by the mangled one (e.g. for templates parametrized with enums the 
demangled one is Impl<(TagType)0> vs original Impl<TagType::Tag1>, but 
there are more complex cases).

Thus, I don't see how adding DW_AT_linkage_name to vtable member of 
class (or even to class itself) could help, as it still won't be 
possible to resolve DIE by the mangled type name. However possible 
solutions are:
   1. To generate a separate accelerator table: mangled name for vtable 
member of a class => DIE;
   2. Build index on startup iterating through the apple_types and 
gather the map mangled name => DIE;

Greg, did you mean some of these or something else?

Thanks,
Anton.

19.12.2017 19:39, Greg Clayton wrote:
> I agree with Tamas. The right way to do this it to add the 
> DW_AT_linkage_name to the class. Apple accelerator tables have many 
> different forms, but one is a mapping of type name to exact DIE offset 
> (in the __DWARF_ segment in the __apple_types section). If the mangled 
> name was added to the class, then the apple accelerator tables would 
> have it. So when a lookup happens with these tables around, we do a 
> very quick hash lookup, and we find the exact DIE (or DIEs) we need. 
> Entries for classes in the Apple accelerator tables have both the 
> mangled and raw class name as entries pointing to the same DIE since 
> lookups don't usually happen via mangled names. LLDB also knows how to 
> pull names apart and search correctly, so if someone tries to lookup a 
> type with "a::b::MyClass", we will chop that up into "MyClass" and do 
> a lookup on that. We might get many many different "MyClass" results 
> back (a::c::MyClass, ::MyClass, b::MyClass), but then we cull those 
> down by making sure any matches have a matching decl context of 
> "a::b::". For mangled names, it is easy and just a direct lookup.
>
> The apple accelerator tables are only enabled for Darwin target, but 
> there is nothing to say we couldn't enable these for other targets in 
> ELF files. It would be a quick way to gauge the performance 
> improvement that these accelerator tables provide for linux. Currently 
> linux will completely index the DWARF, but it will load the DWARF, 
> index it, and unload the DWARF so we don't hog memory for things we 
> don't need loaded yet. We must manually index the DWARF because the 
> DWARF accelerator tables are really not accelerator tables, they are 
> random indexes of related data (names in no particular order, 
> addresses in or particular order). These tables are also not complete 
> so no debugger can rely on them. For example ".debug_pubtypes" is for 
> "public" types only. ".debug_pubnames" is a random name table with 
> only public functions (no static functions or functions in anonymous 
> namespaces). So the DWARF accelerator tables can't be used by debuggers.
>
> There is now a modified version of the Apple accelerator tables in the 
> DWARF standard that can provide the same data as the Apple versions, 
> but I don't believe anyone has added this support to any compilers 
> yet. So for simplicity, we can try things out with the Apple 
> accelerator tables and see how things go.
>
> Another solution involves using llvm-dsymutil, a DWARF linker that is 
> used on Apple platforms. It is a tool that is normally run on 
> executables where the DWARF is left in the .o files and linked later 
> into final DWARF files. This tool also has a "--update" option that 
> take a linked dSYM file and updates the accelerator tables in case 
> they change over time, or in case an older version of llvm-dsymutil 
> didn't add everything that was needed to the tables due to a bug. So 
> another way we can try this out is to modify the llvm-dsymutil to work 
> with ELF files and have it generate and add the Apple accelerator 
> tables to the ELF files. This is nice because it allows us to use 
> DWARF that is generated by any compiler (no need for the compiler to 
> support making the accelerator tables). This would a great way to try 
> out the accelerator tables without requiring compiler changes.
>
> The short term solution is to validate that the Apple accelerator 
> tables work and do speed debugging up by a large amount. The long term 
> solution is to have clang start emitting the new DWARF accelerator 
> tables and modify LLDB to support and use those tables.
>
> Let me know if there are any questions on any of this.
>
> Greg Clayton
>
>> On Dec 19, 2017, at 5:35 AM, Tamas Berghammer via lldb-dev 
>> <lldb-dev at lists.llvm.org <mailto:lldb-dev at lists.llvm.org>> wrote:
>>
>> Hi,
>>
>> I thought most compiler still emits DW_AT_MIPS_linkage_name instead 
>> of the standard DW_AT_linkage_name but I agree that if we can we 
>> should use the standard one.
>>
>> Regarding performance we have 2 different scenarios. On Apple 
>> platforms we have the apple accelerator tables to improve load time 
>> (might work on FreeBsd as well) while on other platforms we Index the 
>> DWARF data (DWARFCompileUnit::Index) to effectively generate 
>> accelerator tables in memory what is a faster process then fully 
>> parsing the DWARF (currently we only parse function DIEs and we don't 
>> build the clang types). I think an ideal solution would be to have 
>> the vtable name stored in DWARF so the DWARF data is standalone and 
>> then have some accelerator tables to be able to do fast lookup from 
>> mangled symbol name to DIE offset. I am not too familiar with the 
>> apple accelerator tables but if we have anything what maps from 
>> mangled name to DIE offset then we can add a few entry to it to map 
>> from mangled vtable name to type DIE or vtable DIE.
>>
>> Tamas
>>
>> On Mon, Dec 18, 2017 at 9:02 PM xgsa <xgsa at yandex.ru 
>> <mailto:xgsa at yandex.ru>> wrote:
>>
>>     Hi Tamas,
>>     First, why DW_AT_MIPS_linkage_name, but not just
>>     DW_AT_linkage_name? The later is standartized and currently
>>     generated by clang at least on x64.
>>     Second, this doesn't help to solve the issue, because this will
>>     require parsing all the DWARF types during startup to build a map
>>     that breaks DWARF lazy load, performed by lldb. Or am I missing
>>     something?
>>     Thanks,
>>     Anton.
>>     18.12.2017, 22:59, "Tamas Berghammer" <tberghammer at google.com
>>     <mailto:tberghammer at google.com>>:
>>>
>>>     Hi Anton and Jim,
>>>
>>>     What do you think about storing the mangled type name or the
>>>     mangled vtable symbol name somewhere in DWARF in the
>>>     DW_AT_MIPS_linkage_name attribute? We are already doing it for
>>>     the mangled names of functions so extending it to types
>>>     shouldn't be too controversial.
>>>
>>>     Tamas
>>>
>>>     On Mon, 18 Dec 2017, 17:29 xgsa via lldb-dev,
>>>     <lldb-dev at lists.llvm.org <mailto:lldb-dev at lists.llvm.org>> wrote:
>>>
>>>         Thank you for clarification, Jim, you are right, I
>>>         misunderstood a little bit what lldb actually does.
>>>
>>>         It is not that the compiler can't be fixed, it's about the
>>>         fact that relying on correspondence of mangled and demangled
>>>         forms are not reliable enough, so we are looking for more
>>>         robust alternatives. Moreover, I am not sure that such fuzzy
>>>         matching could be done just basing on class name, so it will
>>>         require reading more DIEs. Taking into account that, for
>>>         instance, in our project there are quite many such types, it
>>>         could noticeable slow down the debugger.
>>>
>>>         Thus, I'd like to mention one more alternative and get your
>>>         feedback, if possible. Actually, what is necessary is the
>>>         correspondence of mangled and demangled vtable symbol.
>>>         Possibly, it worth preparing a separate section during
>>>         compilation (like e.g. apple_types), which would store this
>>>         correspondence? It will work fast and be more reliable than
>>>         the current approach, but certainly, will increase debug
>>>         info size (however, cannot estimate which exact increase
>>>         will be, e.g. in persent).
>>>
>>>         What do you think? Which solution is preferable?
>>>
>>>         Thanks,
>>>         Anton.
>>>
>>>         15.12.2017, 23:34, "Jim Ingham" <jingham at apple.com
>>>         <mailto:jingham at apple.com>>:
>>>         > First off, just a technical point. lldb doesn't use RTTI
>>>         to find dynamic types, and in fact works for projects like
>>>         lldb & clang that turn off RTTI. It just uses the fact that
>>>         the vtable symbol for an object demangles to:
>>>         >
>>>         > vtable for CLASSNAME
>>>         >
>>>         > That's not terribly important, but I just wanted to make
>>>         sure people didn't think lldb was doing something fancy with
>>>         RTTI... Note, gdb does (or at least used to do) dynamic
>>>         detection the same way.
>>>         >
>>>         > If the compiler can't be fixed, then it seems like your
>>>         solution [2] is what we'll have to try.
>>>         >
>>>         > As it works now, we get the CLASSNAME from the vtable
>>>         symbol and look it up in the the list of types. That is
>>>         pretty quick because the type names are indexed, so we can
>>>         find it with a quick search in the index. Changing this over
>>>         to a method where we do some additional string matching
>>>         rather than just using the table's hashing is going to be a
>>>         fair bit slower because you have to run over EVERY type
>>>         name. But this might not be that bad. You would first look
>>>         it up by exact CLASSNAME and only fall back on your fuzzy
>>>         match if this fails, so most dynamic type lookups won't see
>>>         any slowdown. And if you know the cases where you get into
>>>         this problem you can probably further restrict when you need
>>>         to do this work so you don't suffer this penalty for every
>>>         lookup where we don't have debug info for the dynamic type.
>>>         And you could keep a side-table of mangled-name -> DWARF
>>>         name, and maybe a black-list for unfound names, so you only
>>>         have to do this once.
>>>         >
>>>         > This estimation is based on the assumption that you can do
>>>         your work just on the type names, without having to get more
>>>         type information out of the DWARF for each candidate match.
>>>         A solution that relies on realizing every class in lldb so
>>>         you can get more information out of the type information to
>>>         help with the match will defeat all our attempts at lazy
>>>         DWARF reading. This can cause quite long delays in big
>>>         programs. So I would be much more worried about a solution
>>>         that requires this kind of work. Again, if you can reject
>>>         most potential candidates by looking at the name, and only
>>>         have to realize a few likely types, the approach might not
>>>         be that slow.
>>>         >
>>>         > Jim
>>>         >
>>>         >>  On Dec 15, 2017, at 7:11 AM, xgsa via lldb-dev
>>>         <lldb-dev at lists.llvm.org <mailto:lldb-dev at lists.llvm.org>>
>>>         wrote:
>>>         >>
>>>         >>  Sorry, I probably shouldn't have used HTML for that
>>>         message. Converted to plain text.
>>>         >>
>>>         >>  -------- Original message --------
>>>         >>  15.12.2017, 18:01, "xgsa" <xgsa at yandex.ru
>>>         <mailto:xgsa at yandex.ru>>:
>>>         >>
>>>         >>  Hi,
>>>         >>
>>>         >>  I am working on issue that in C++ program for some
>>>         complex cases with templates showing dynamic type based on
>>>         RTTI in lldb doesn't work properly. Consider the following
>>>         example:
>>>         >>  enum class TagType : bool
>>>         >>  {
>>>         >>     Tag1
>>>         >>  };
>>>         >>
>>>         >>  struct I
>>>         >>  {
>>>         >>     virtual ~I() = default;
>>>         >>  };
>>>         >>
>>>         >>  template <TagType Tag>
>>>         >>  struct Impl : public I
>>>         >>  {
>>>         >>  private:
>>>         >>     int v = 123;
>>>         >>  };
>>>         >>
>>>         >>  int main(int argc, const char * argv[]) {
>>>         >>     Impl<TagType::Tag1> impl;
>>>         >>     I& i = impl;
>>>         >>     return 0;
>>>         >>  }
>>>         >>
>>>         >>  For this example clang generates type name
>>>         "Impl<TagType::Tag1>" in DWARF and "__ZTS4ImplIL7TagType0EE"
>>>         when mangling symbols (which lldb demangles to
>>>         Impl<(TagType)0>). Thus when in
>>>         ItaniumABILanguageRuntime::GetTypeInfoFromVTableAddress()
>>>         lldb tries to resolve the type, it is unable to find it.
>>>         More cases and the detailed description why lldb fails here
>>>         can be found in this clang review, which tries to fix this
>>>         in clang [1].
>>>         >>
>>>         >>  However, during the discussion around this review [2],
>>>         it was pointed out that DWARF names are expected to be close
>>>         to sources, which clang does perfectly, whereas mangling
>>>         algorithm is strictly defined. Thus matching them on
>>>         equality could sometimes fail. The suggested idea in [2] was
>>>         to implement more semantically aware matching. There is
>>>         enough information in the DWARF to semantically match
>>>         "Impl<(TagType)0>)" with "Impl<TagType::Tag1>", as enum
>>>         TagType is in the DWARF, and the enumerator Tag1 is present
>>>         with its value 0. I have some concerns about the performance
>>>         of such solution, but I'd like to know your opinion about
>>>         this idea in general. In case it is approved, I'm going to
>>>         work on implementing it.
>>>         >>
>>>         >>  So what do you think about type names inequality and the
>>>         suggested solution?
>>>         >
>>>         >>  [1] - https://reviews.llvm.org/D39622
>>>         >>  [2] -
>>>         http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20171211/212859.html
>>>         >>
>>>         >>  Thank you,
>>>         >>  Anton.
>>>         >>  _______________________________________________
>>>         >>  lldb-dev mailing list
>>>         >> lldb-dev at lists.llvm.org <mailto:lldb-dev at lists.llvm.org>
>>>         >> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>         _______________________________________________
>>>         lldb-dev mailing list
>>>         lldb-dev at lists.llvm.org <mailto:lldb-dev at lists.llvm.org>
>>>         http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>
>> _______________________________________________
>> lldb-dev mailing list
>> lldb-dev at lists.llvm.org <mailto:lldb-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>