[lldb-dev] Resolving dynamic type based on RTTI fails in case of type names inequality in DWARF and mangled symbols

Tue Dec 19 13:20:42 PST 2017

19.12.2017 23:12, Greg Clayton wrote:

>
>> On Dec 19, 2017, at 12:33 PM, Anton Gorenkov <xgsa at yandex.ru 
>> <mailto:xgsa at yandex.ru>> wrote:
>>
>> Tamas, Greg, thank you, I got the idea how it should work without 
>> accelerator tables, but I still cannot figure out how to use/update 
>> the existing accelerator tables. So let me walk trough it once again:
>>   1. It is necessary to perform lookup by mangled name (as all we 
>> initially have is mangled "vtable for ClassName"-symbol).
>>   2. All the existing apple accelerator tables (e.g. apple_types) 
>> have demangled and unqualified names as a key.
>>   3. It is not always possible to get the original demanled type name 
>> by the mangled one (e.g. for templates parametrized with enums the 
>> demangled one is Impl<(TagType)0> vs original Impl<TagType::Tag1>, 
>> but there are more complex cases).
>>
>> Thus, I don't see how adding DW_AT_linkage_name to vtable member of 
>> class (or even to class itself) could help, as it still won't be 
>> possible to resolve DIE by the mangled type name. However possible 
>> solutions are:
>>   1. To generate a separate accelerator table: mangled name for 
>> vtable member of a class => DIE;
>>   2. Build index on startup iterating through the apple_types and 
>> gather the map mangled name => DIE;
>>
>> Greg, did you mean some of these or something else?
>
> I didn't realize that the mangled name differs in certain cases and 
> that it wouldn't suffice for a lookup. Can you give an example of the 
> name we try looking up versus what is actually in the symbol table?
Case 1:

enum class TagType : bool {
         Tag1
};

struct I {
         virtual ~I() = default;
};

template <TagType Tag>
struct Impl : public I {
     private:
         int v = 123;
};

int main(int argc, const char * argv[]) {
         Impl<TagType::Tag1> impl;
         I& i = impl;
         return 0;
}
lldb demangles the name to Impl<(TagType)0> and it's "Impl<TagType::Tag1>" in DWARF generated by clang.

Case 2:
struct I
{
   virtual ~I(){}
};

template <int Tag>
struct Impl : public I
{
         int v = 123;
};

template <>
struct Impl<1+1+1> : public I  // Note the expression used for this specialization
{
         int v = 124;
};

template <class T>
struct TT {
   I* i = new T();
};

int main(int argc, const char * argv[]) {
     TT<Impl<3>> tt;
     return 0;  // [*]
}
lldb demangles name to "Impl<3>", whereas clang generates "Impl<1+1+1>" in DWARF.

> IIUC right now we lookup the address of the first pointer within a 
> class if it is virtual and find the symbol name that this corresponds 
> to, and in the failing cases you have we don't find anything in the 
> DWARF that matches. Is that right?
Exactly, for the cases above and some others.
>>
>> Thanks,
>> Anton.
>>
>> 19.12.2017 19:39, Greg Clayton wrote:
>>> I agree with Tamas. The right way to do this it to add the 
>>> DW_AT_linkage_name to the class. Apple accelerator tables have many 
>>> different forms, but one is a mapping of type name to exact DIE 
>>> offset (in the __DWARF_ segment in the __apple_types section). If 
>>> the mangled name was added to the class, then the apple accelerator 
>>> tables would have it. So when a lookup happens with these tables 
>>> around, we do a very quick hash lookup, and we find the exact DIE 
>>> (or DIEs) we need. Entries for classes in the Apple accelerator 
>>> tables have both the mangled and raw class name as entries pointing 
>>> to the same DIE since lookups don't usually happen via mangled 
>>> names. LLDB also knows how to pull names apart and search correctly, 
>>> so if someone tries to lookup a type with "a::b::MyClass", we will 
>>> chop that up into "MyClass" and do a lookup on that. We might get 
>>> many many different "MyClass" results back (a::c::MyClass, 
>>> ::MyClass, b::MyClass), but then we cull those down by making sure 
>>> any matches have a matching decl context of "a::b::". For mangled 
>>> names, it is easy and just a direct lookup.
>>>
>>> The apple accelerator tables are only enabled for Darwin target, but 
>>> there is nothing to say we couldn't enable these for other targets 
>>> in ELF files. It would be a quick way to gauge the performance 
>>> improvement that these accelerator tables provide for linux. 
>>> Currently linux will completely index the DWARF, but it will load 
>>> the DWARF, index it, and unload the DWARF so we don't hog memory for 
>>> things we don't need loaded yet. We must manually index the DWARF 
>>> because the DWARF accelerator tables are really not accelerator 
>>> tables, they are random indexes of related data (names in no 
>>> particular order, addresses in or particular order). These tables 
>>> are also not complete so no debugger can rely on them. For example 
>>> ".debug_pubtypes" is for "public" types only. ".debug_pubnames" is a 
>>> random name table with only public functions (no static functions or 
>>> functions in anonymous namespaces). So the DWARF accelerator tables 
>>> can't be used by debuggers.
>>>
>>> There is now a modified version of the Apple accelerator tables in 
>>> the DWARF standard that can provide the same data as the Apple 
>>> versions, but I don't believe anyone has added this support to any 
>>> compilers yet. So for simplicity, we can try things out with the 
>>> Apple accelerator tables and see how things go.
>>>
>>> Another solution involves using llvm-dsymutil, a DWARF linker that 
>>> is used on Apple platforms. It is a tool that is normally run on 
>>> executables where the DWARF is left in the .o files and linked later 
>>> into final DWARF files. This tool also has a "--update" option that 
>>> take a linked dSYM file and updates the accelerator tables in case 
>>> they change over time, or in case an older version of llvm-dsymutil 
>>> didn't add everything that was needed to the tables due to a bug. So 
>>> another way we can try this out is to modify the llvm-dsymutil to 
>>> work with ELF files and have it generate and add the Apple 
>>> accelerator tables to the ELF files. This is nice because it allows 
>>> us to use DWARF that is generated by any compiler (no need for the 
>>> compiler to support making the accelerator tables). This would a 
>>> great way to try out the accelerator tables without requiring 
>>> compiler changes.
>>>
>>> The short term solution is to validate that the Apple accelerator 
>>> tables work and do speed debugging up by a large amount. The long 
>>> term solution is to have clang start emitting the new DWARF 
>>> accelerator tables and modify LLDB to support and use those tables.
>>>
>>> Let me know if there are any questions on any of this.
>>>
>>> Greg Clayton
>>>
>>>> On Dec 19, 2017, at 5:35 AM, Tamas Berghammer via lldb-dev 
>>>> <lldb-dev at lists.llvm.org 
>>>> <mailto:lldb-dev at lists.llvm.org><mailto:lldb-dev at lists.llvm.org>> 
>>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I thought most compiler still emits DW_AT_MIPS_linkage_name instead 
>>>> of the standard DW_AT_linkage_name but I agree that if we can we 
>>>> should use the standard one.
>>>>
>>>> Regarding performance we have 2 different scenarios. On Apple 
>>>> platforms we have the apple accelerator tables to improve load time 
>>>> (might work on FreeBsd as well) while on other platforms we Index 
>>>> the DWARF data (DWARFCompileUnit::Index) to effectively generate 
>>>> accelerator tables in memory what is a faster process then fully 
>>>> parsing the DWARF (currently we only parse function DIEs and we 
>>>> don't build the clang types). I think an ideal solution would be to 
>>>> have the vtable name stored in DWARF so the DWARF data is 
>>>> standalone and then have some accelerator tables to be able to do 
>>>> fast lookup from mangled symbol name to DIE offset. I am not too 
>>>> familiar with the apple accelerator tables but if we have anything 
>>>> what maps from mangled name to DIE offset then we can add a few 
>>>> entry to it to map from mangled vtable name to type DIE or vtable DIE.
>>>>
>>>> Tamas
>>>>
>>>> On Mon, Dec 18, 2017 at 9:02 PM xgsa <xgsa at yandex.ru 
>>>> <mailto:xgsa at yandex.ru><mailto:xgsa at yandex.ru>> wrote:
>>>>
>>>>    Hi Tamas,
>>>>    First, why DW_AT_MIPS_linkage_name, but not just
>>>>    DW_AT_linkage_name? The later is standartized and currently
>>>>    generated by clang at least on x64.
>>>>    Second, this doesn't help to solve the issue, because this will
>>>>    require parsing all the DWARF types during startup to build a map
>>>>    that breaks DWARF lazy load, performed by lldb. Or am I missing
>>>>    something?
>>>>    Thanks,
>>>>    Anton.
>>>>    18.12.2017, 22:59, "Tamas Berghammer" <tberghammer at google.com 
>>>> <mailto:tberghammer at google.com>
>>>>    <mailto:tberghammer at google.com>>:
>>>>>
>>>>>    Hi Anton and Jim,
>>>>>
>>>>>    What do you think about storing the mangled type name or the
>>>>>    mangled vtable symbol name somewhere in DWARF in the
>>>>>    DW_AT_MIPS_linkage_name attribute? We are already doing it for
>>>>>    the mangled names of functions so extending it to types
>>>>>    shouldn't be too controversial.
>>>>>
>>>>>    Tamas
>>>>>
>>>>>    On Mon, 18 Dec 2017, 17:29 xgsa via lldb-dev,
>>>>>    <lldb-dev at lists.llvm.org 
>>>>> <mailto:lldb-dev at lists.llvm.org><mailto:lldb-dev at lists.llvm.org>> 
>>>>> wrote:
>>>>>
>>>>>        Thank you for clarification, Jim, you are right, I
>>>>>        misunderstood a little bit what lldb actually does.
>>>>>
>>>>>        It is not that the compiler can't be fixed, it's about the
>>>>>        fact that relying on correspondence of mangled and demangled
>>>>>        forms are not reliable enough, so we are looking for more
>>>>>        robust alternatives. Moreover, I am not sure that such fuzzy
>>>>>        matching could be done just basing on class name, so it will
>>>>>        require reading more DIEs. Taking into account that, for
>>>>>        instance, in our project there are quite many such types, it
>>>>>        could noticeable slow down the debugger.
>>>>>
>>>>>        Thus, I'd like to mention one more alternative and get your
>>>>>        feedback, if possible. Actually, what is necessary is the
>>>>>        correspondence of mangled and demangled vtable symbol.
>>>>>        Possibly, it worth preparing a separate section during
>>>>>        compilation (like e.g. apple_types), which would store this
>>>>>        correspondence? It will work fast and be more reliable than
>>>>>        the current approach, but certainly, will increase debug
>>>>>        info size (however, cannot estimate which exact increase
>>>>>        will be, e.g. in persent).
>>>>>
>>>>>        What do you think? Which solution is preferable?
>>>>>
>>>>>        Thanks,
>>>>>        Anton.
>>>>>
>>>>>        15.12.2017, 23:34, "Jim Ingham" <jingham at apple.com 
>>>>> <mailto:jingham at apple.com>
>>>>>        <mailto:jingham at apple.com>>:
>>>>>        > First off, just a technical point. lldb doesn't use RTTI
>>>>>        to find dynamic types, and in fact works for projects like
>>>>>        lldb & clang that turn off RTTI. It just uses the fact that
>>>>>        the vtable symbol for an object demangles to:
>>>>>        >
>>>>>        > vtable for CLASSNAME
>>>>>        >
>>>>>        > That's not terribly important, but I just wanted to make
>>>>>        sure people didn't think lldb was doing something fancy with
>>>>>        RTTI... Note, gdb does (or at least used to do) dynamic
>>>>>        detection the same way.
>>>>>        >
>>>>>        > If the compiler can't be fixed, then it seems like your
>>>>>        solution [2] is what we'll have to try.
>>>>>        >
>>>>>        > As it works now, we get the CLASSNAME from the vtable
>>>>>        symbol and look it up in the the list of types. That is
>>>>>        pretty quick because the type names are indexed, so we can
>>>>>        find it with a quick search in the index. Changing this over
>>>>>        to a method where we do some additional string matching
>>>>>        rather than just using the table's hashing is going to be a
>>>>>        fair bit slower because you have to run over EVERY type
>>>>>        name. But this might not be that bad. You would first look
>>>>>        it up by exact CLASSNAME and only fall back on your fuzzy
>>>>>        match if this fails, so most dynamic type lookups won't see
>>>>>        any slowdown. And if you know the cases where you get into
>>>>>        this problem you can probably further restrict when you need
>>>>>        to do this work so you don't suffer this penalty for every
>>>>>        lookup where we don't have debug info for the dynamic type.
>>>>>        And you could keep a side-table of mangled-name -> DWARF
>>>>>        name, and maybe a black-list for unfound names, so you only
>>>>>        have to do this once.
>>>>>        >
>>>>>        > This estimation is based on the assumption that you can do
>>>>>        your work just on the type names, without having to get more
>>>>>        type information out of the DWARF for each candidate match.
>>>>>        A solution that relies on realizing every class in lldb so
>>>>>        you can get more information out of the type information to
>>>>>        help with the match will defeat all our attempts at lazy
>>>>>        DWARF reading. This can cause quite long delays in big
>>>>>        programs. So I would be much more worried about a solution
>>>>>        that requires this kind of work. Again, if you can reject
>>>>>        most potential candidates by looking at the name, and only
>>>>>        have to realize a few likely types, the approach might not
>>>>>        be that slow.
>>>>>        >
>>>>>        > Jim
>>>>>        >
>>>>>        >>  On Dec 15, 2017, at 7:11 AM, xgsa via lldb-dev
>>>>>        <lldb-dev at lists.llvm.org 
>>>>> <mailto:lldb-dev at lists.llvm.org><mailto:lldb-dev at lists.llvm.org>>
>>>>>        wrote:
>>>>>        >>
>>>>>        >>  Sorry, I probably shouldn't have used HTML for that
>>>>>        message. Converted to plain text.
>>>>>        >>
>>>>>        >>  -------- Original message --------
>>>>>        >>  15.12.2017, 18:01, "xgsa" <xgsa at yandex.ru 
>>>>> <mailto:xgsa at yandex.ru>
>>>>>        <mailto:xgsa at yandex.ru>>:
>>>>>        >>
>>>>>        >>  Hi,
>>>>>        >>
>>>>>        >>  I am working on issue that in C++ program for some
>>>>>        complex cases with templates showing dynamic type based on
>>>>>        RTTI in lldb doesn't work properly. Consider the following
>>>>>        example:
>>>>>        >>  enum class TagType : bool
>>>>>        >>  {
>>>>>        >>     Tag1
>>>>>        >>  };
>>>>>        >>
>>>>>        >>  struct I
>>>>>        >>  {
>>>>>        >>     virtual ~I() = default;
>>>>>        >>  };
>>>>>        >>
>>>>>        >>  template <TagType Tag>
>>>>>        >>  struct Impl : public I
>>>>>        >>  {
>>>>>        >>  private:
>>>>>        >>     int v = 123;
>>>>>        >>  };
>>>>>        >>
>>>>>        >>  int main(int argc, const char * argv[]) {
>>>>>        >>     Impl<TagType::Tag1> impl;
>>>>>        >>     I& i = impl;
>>>>>        >>     return 0;
>>>>>        >>  }
>>>>>        >>
>>>>>        >>  For this example clang generates type name
>>>>>        "Impl<TagType::Tag1>" in DWARF and "__ZTS4ImplIL7TagType0EE"
>>>>>        when mangling symbols (which lldb demangles to
>>>>>        Impl<(TagType)0>). Thus when in
>>>>>        ItaniumABILanguageRuntime::GetTypeInfoFromVTableAddress()
>>>>>        lldb tries to resolve the type, it is unable to find it.
>>>>>        More cases and the detailed description why lldb fails here
>>>>>        can be found in this clang review, which tries to fix this
>>>>>        in clang [1].
>>>>>        >>
>>>>>        >>  However, during the discussion around this review [2],
>>>>>        it was pointed out that DWARF names are expected to be close
>>>>>        to sources, which clang does perfectly, whereas mangling
>>>>>        algorithm is strictly defined. Thus matching them on
>>>>>        equality could sometimes fail. The suggested idea in [2] was
>>>>>        to implement more semantically aware matching. There is
>>>>>        enough information in the DWARF to semantically match
>>>>>        "Impl<(TagType)0>)" with "Impl<TagType::Tag1>", as enum
>>>>>        TagType is in the DWARF, and the enumerator Tag1 is present
>>>>>        with its value 0. I have some concerns about the performance
>>>>>        of such solution, but I'd like to know your opinion about
>>>>>        this idea in general. In case it is approved, I'm going to
>>>>>        work on implementing it.
>>>>>        >>
>>>>>        >>  So what do you think about type names inequality and the
>>>>>        suggested solution?
>>>>>        >
>>>>>        >>  [1] -https://reviews.llvm.org/D39622
>>>>>        >>  [2] -
>>>>> http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20171211/212859.html
>>>>>        >>
>>>>>        >>  Thank you,
>>>>>        >>  Anton.
>>>>>        >>  _______________________________________________
>>>>>        >>  lldb-dev mailing list
>>>>>        >>lldb-dev at lists.llvm.org 
>>>>> <mailto:lldb-dev at lists.llvm.org><mailto:lldb-dev at lists.llvm.org>
>>>>>        >>http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>>>        _______________________________________________
>>>>>        lldb-dev mailing list
>>>>> lldb-dev at lists.llvm.org 
>>>>> <mailto:lldb-dev at lists.llvm.org><mailto:lldb-dev at lists.llvm.org>
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>>>>>
>>>> _______________________________________________
>>>> lldb-dev mailing list
>>>> lldb-dev at lists.llvm.org 
>>>> <mailto:lldb-dev at lists.llvm.org><mailto:lldb-dev at lists.llvm.org>
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20171219/1afd6ab6/attachment-0001.html>