[lldb-dev] DWARFASTParserClang and DW_TAG_typedef for anonymous structs

Greg Clayton via lldb-dev lldb-dev at lists.llvm.org
Thu Mar 10 14:03:58 PST 2016


Thanks for the example, this is indeed a new regression. It used to work (Xcode 7.2), but now with top of tree it doesn't. Sean Callanan recently pulled out a bunch of work around we used to have in the expression/JIT so that we can avoid certain issues that were caused by said work arounds, and those are causing problems now. I looked at the old expression parser and it will still making up the name _Z6myfuncP3$_0, but it would first try the mangled name, and if it didn't find that, then it would fall back to just looking for the demangled basename ("myfunc"). We removed this work around because now we are trying to be more correct, and that caused this regression. Sean Callanan will take a look at this and get a fix for it sometime soon. What is probably happening is we are removing the typedef sugar from the function arguments somewhere that we shouldn't be (like maybe in the clang::ASTImporter or our lldb_private::ClangASTImporter). We should be trying to lookup the mangle name "_Z6myfuncP18my_untagged_struct", but somehow when we go to lookup the type we lost the my_untagged_struct and are looking for an anonymous struct "$_0" instead.

Greg Clayton

> On Mar 10, 2016, at 10:20 AM, Luke Drummond <luke.drummond at codeplay.com> wrote:
> 
> Hi Greg
> 
> First of all thanks for taking the time to help out with this.
> 
> On 10/03/16 00:18, Greg Clayton wrote:
>> So we ran into a problem where we had anonymous structs in modules. They have no name, so we had no way to say "module A, please give me a struct named... nothing in the namespace 'foo'". Obviously this doesn't work, so we always try to make sure a typedef doesn't come from a module first, by asking us to get the typedef from the DWO file:
>> 
>> type_sp = ParseTypeFromDWO(die, log);
>> 
>> If this fails, it just means we have the typedef in hand. If I compile your example I end up with:
>> 
>> 0x0000000b: TAG_compile_unit [1] *
>>              AT_producer( "Apple LLVM version 8.0.0 (clang-800.0.5.3)" )
>>              AT_language( DW_LANG_C99 )
>>              AT_name( "main.c" )
>>              AT_stmt_list( 0x00000000 )
>>              AT_comp_dir( "/tmp" )
>>              AT_low_pc( 0x0000000100000f60 )
>>              AT_high_pc( 0x0000000100000fb0 )
>> 
>> 0x0000002e:     TAG_subprogram [2] *
>>                  AT_low_pc( 0x0000000100000f60 )
>>                  AT_high_pc( 0x0000000100000f85 )
>>                  AT_frame_base( rbp )
>>                  AT_name( "myfunc" )
>>                  AT_decl_file( "/private/tmp/main.c" )
>>                  AT_decl_line( 6 )
>>                  AT_prototyped( 0x01 )
>>                  AT_external( 0x01 )
>> 
>> 0x00000049:         TAG_formal_parameter [3]
>>                      AT_location( fbreg -8 )
>>                      AT_name( "s" )
>>                      AT_decl_file( "/private/tmp/main.c" )
>>                      AT_decl_line( 6 )
>>                      AT_type( {0x0000008c} ( my_untagged_struct* ) )
>> 
>> 0x00000057:         NULL
>> 
>> 0x00000058:     TAG_subprogram [4] *
>>                  AT_low_pc( 0x0000000100000f90 )
>>                  AT_high_pc( 0x0000000100000fb0 )
>>                  AT_frame_base( rbp )
>>                  AT_name( "main" )
>>                  AT_decl_file( "/private/tmp/main.c" )
>>                  AT_decl_line( 12 )
>>                  AT_type( {0x00000085} ( int ) )
>>                  AT_external( 0x01 )
>> 
>> 0x00000076:         TAG_variable [5]
>>                      AT_location( fbreg -16 )
>>                      AT_name( "s" )
>>                      AT_decl_file( "/private/tmp/main.c" )
>>                      AT_decl_line( 14 )
>>                      AT_type( {0x00000091} ( my_untagged_struct ) )
>> 
>> 0x00000084:         NULL
>> 
>> 0x00000085:     TAG_base_type [6]
>>                  AT_name( "int" )
>>                  AT_encoding( DW_ATE_signed )
>>                  AT_byte_size( 0x04 )
>> 
>> 0x0000008c:     TAG_pointer_type [7]
>>                  AT_type( {0x00000091} ( my_untagged_struct ) )
>> 
>> 0x00000091:     TAG_typedef [8]
>>                  AT_type( {0x0000009c} ( struct  ) )
>>                  AT_name( "my_untagged_struct" )
>>                  AT_decl_file( "/private/tmp/main.c" )
>>                  AT_decl_line( 4 )
>> 
>> 0x0000009c:     TAG_structure_type [9] *
>>                  AT_byte_size( 0x08 )
>>                  AT_decl_file( "/private/tmp/main.c" )
>>                  AT_decl_line( 1 )
>> 
>> 0x000000a0:         TAG_member [10]
>>                      AT_name( "i" )
>>                      AT_type( {0x00000085} ( int ) )
>>                      AT_decl_file( "/private/tmp/main.c" )
>>                      AT_decl_line( 2 )
>>                      AT_data_member_location( +0 )
>> 
>> 0x000000ae:         TAG_member [10]
>>                      AT_name( "f" )
>>                      AT_type( {0x000000bd} ( float ) )
>>                      AT_decl_file( "/private/tmp/main.c" )
>>                      AT_decl_line( 3 )
>>                      AT_data_member_location( +4 )
>> 
>> 0x000000bc:         NULL
>> 
>> 0x000000bd:     TAG_base_type [6]
>>                  AT_name( "float" )
>>                  AT_encoding( DW_ATE_float )
>>                  AT_byte_size( 0x04 )
>> 
>> 0x000000c4:     NULL
>> 
>> 
>> Note that the typedef is at 0x00000091, and it is a typedef to 0x0000009c.  Also note that the DWARF DIE at 0x0000009c is a complete definition as it has children describing its members and 0x0000009c doesn't have a DW_AT_declaration(1) attribute. Is this how your DWARF looks for your stuff? The DWARF you had looked like:
>> 
>> 0x0000005c:   DW_TAG_typedef [6]
>>                DW_AT_name( "my_untagged_struct" )
>>                DW_AT_decl_file("/home/luke/main.cpp")
>>                DW_AT_decl_line(4)
>>                DW_AT_type({0x0000002d})
>> 
>> 
>> What did the type at 0x0000002d look like? Similar to 0x0000009c in my DWARF I presume?
> 
> In the case of C89/C99, yes, but regrettably when you compile my example as C++ or use __attribute__((overloadable)) the DWARF does not include the DW_AT_name for the typedef in the formal parameter[0] of myfunc
> 
> COMPILE_UNIT<header overall offset = 0x00000000>:
> < 0><0x0000000b>  DW_TAG_compile_unit
>                    DW_AT_producer              "GNU C++ 4.8.4 -mtune=generic -march=x86-64 -g -fstack-protector"
>                    DW_AT_language              DW_LANG_C_plus_plus
>                    DW_AT_name                  "main.cpp"
>                    DW_AT_comp_dir              "/tmp"
>                    DW_AT_low_pc                0x004004ed
>                    DW_AT_high_pc               <offset-from-lowpc>60
>                    DW_AT_stmt_list             0x00000000
> 
> LOCAL_SYMBOLS:
> < 1><0x0000002d>    DW_TAG_structure_type
>                      DW_AT_byte_size             0x00000008
>                      DW_AT_decl_file             0x00000001 /tmp/main.cpp
>                      DW_AT_decl_line             0x00000001
>                      DW_AT_linkage_name          "18my_untagged_struct"
>                      DW_AT_sibling               <0x0000004e>
> < 2><0x00000039>      DW_TAG_member
>                        DW_AT_name                  "i"
>                        DW_AT_decl_file             0x00000001 /tmp/main.cpp
>                        DW_AT_decl_line             0x00000002
>                        DW_AT_type                  <0x0000004e>
>                        DW_AT_data_member_location  0
> < 2><0x00000043>      DW_TAG_member
>                        DW_AT_name                  "f"
>                        DW_AT_decl_file             0x00000001 /tmp/main.cpp
>                        DW_AT_decl_line             0x00000003
>                        DW_AT_type                  <0x00000055>
>                        DW_AT_data_member_location  4
> < 1><0x0000004e>    DW_TAG_base_type
>                      DW_AT_byte_size             0x00000004
>                      DW_AT_encoding              DW_ATE_signed
>                      DW_AT_name                  "int"
> < 1><0x00000055>    DW_TAG_base_type
>                      DW_AT_byte_size             0x00000004
>                      DW_AT_encoding              DW_ATE_float
>                      DW_AT_name                  "float"
> < 1><0x0000005c>    DW_TAG_typedef
>                      DW_AT_name                  "my_untagged_struct"
>                      DW_AT_decl_file             0x00000001 /tmp/main.cpp
>                      DW_AT_decl_line             0x00000004
>                      DW_AT_type                  <0x0000002d>
> < 1><0x00000067>    DW_TAG_subprogram
>                      DW_AT_external              yes(1)
>                      DW_AT_name                  "myfunc"
>                      DW_AT_decl_file             0x00000001 /tmp/main.cpp
>                      DW_AT_decl_line             0x00000006
>                      DW_AT_linkage_name "_Z6myfuncP18my_untagged_struct"
>                      DW_AT_low_pc                0x004004ed
>                      DW_AT_high_pc               <offset-from-lowpc>33
>                      DW_AT_frame_base            len 0x0001: 9c: DW_OP_call_frame_cfa
>                      DW_AT_GNU_all_call_sites    yes(1)
>                      DW_AT_sibling               <0x00000095>
> < 2><0x00000088>      DW_TAG_formal_parameter
>                        DW_AT_name                  "s"
>                        DW_AT_decl_file             0x00000001 /tmp/main.cpp
>                        DW_AT_decl_line             0x00000006
>                        DW_AT_type                  <0x00000095>
>                        DW_AT_location              len 0x0002: 9168: DW_OP_fbreg -24
> < 1><0x00000095>    DW_TAG_pointer_type
>                      DW_AT_byte_size             0x00000008
>                      DW_AT_type                  <0x0000005c>
> < 1><0x0000009b>    DW_TAG_subprogram
>                      DW_AT_external              yes(1)
>                      DW_AT_name                  "main"
>                      DW_AT_decl_file             0x00000001 /tmp/main.cpp
>                      DW_AT_decl_line             0x0000000c
>                      DW_AT_type                  <0x0000004e>
>                      DW_AT_low_pc                0x0040050e
>                      DW_AT_high_pc               <offset-from-lowpc>27
>                      DW_AT_frame_base            len 0x0001: 9c: DW_OP_call_frame_cfa
>                      DW_AT_GNU_all_tail_call_sitesyes(1)
> < 2><0x000000b8>      DW_TAG_lexical_block
>                        DW_AT_low_pc                0x00400516
>                        DW_AT_high_pc               <offset-from-lowpc>17
> < 3><0x000000c9>        DW_TAG_variable
>                          DW_AT_name                  "s"
>                          DW_AT_decl_file             0x00000001 /tmp/main.cpp
>                          DW_AT_decl_line             0x0000000e
>                          DW_AT_type                  <0x0000005c>
>                          DW_AT_location              len 0x0002: 9160: DW_OP_fbreg -32
> 
> 
>> 
>> The DWARFASTParserClang class is responsible for making up a clang type in the clang::ASTContext for this typedef. What will happen in the code where the flow falls through is the we will make a lldb_private::Type that says "I am a typedef to type whose user ID is 0x0000002d (in your example)". A NULL pointer should not be returned from the DWARFASTParserClang::ParseTypeFromDWARF() function. If it is, please step through and figure out why. I compiled your example and did the following:
>> 
>> 
>> % lldb a.out
>> (lldb) b main
>> (lldb) r
>> Process 89808 launched: '/private/tmp/a.out' (x86_64)
>> Process 89808 stopped
>> * thread #1: tid = 0xf7473, 0x0000000100000fa3 a.out main + 19, stop reason = breakpoint 1.1, queue = com.apple.main-thread
>>     frame #0: 0x0000000100000fa3 a.out main + 19 at main.c:15
>>    12  	int main()
>>    13  	{
>>    14  	   my_untagged_struct s;
>> -> 15  	   myfunc(&s);
>>    16  	   return 0;
>>    17  	}
>> (lldb) p myfunc(&s)
>> (lldb)
>> 
>> So I was able to call this function. Are you not able to call it?
> 
> I tried compiling with standard C99, and as you note, this works fine; however, C++ fails:
> 
> $ lldb a.out -o 'b 15' -o 'process launch'
> (lldb) target create "a.out"
> Current executable set to 'a.out' (x86_64).
> (lldb) b 15
> Breakpoint 1: where = a.out`main + 8 at main.cpp:15, address = 0x0000000000400516
> (lldb) process launch
> Process 18718 stopped
> * thread #1: tid = 18718, 0x0000000000400516 a.out`main + 8 at main.cpp:15, name = 'a.out', stop reason = breakpoint 1.1
>    frame #0: 0x0000000000400516 a.out`main + 8 at main.cpp:15
> 
> Process 18718 launched: '/tmp/a.out' (x86_64)
> (lldb) expr myfunc(&s)
> error: Couldn't lookup symbols:
>  myfunc($_0*)
> (lldb)
> 
> 
>> 
>> Likewise if I step into this function I can see the variable:
>> 
>> (lldb) s
>> (lldb) fr var s
>> (my_untagged_struct *) s = 0x00007fff5fbff8d0
>> (lldb) fr var *s
>> (my_untagged_struct) *s = (i = 0, f = 3.1400001)
> 
> This does indeed seem to work
> 
> (lldb) s
> Process 18769 stopped
> * thread #1: tid = 18769, 0x00000000004004f5 a.out`myfunc(s=0x00007fffffffe2b0) + 8 at main.cpp:8, name = 'a.out', stop reason = step in
> frame #0: 0x00000000004004f5 a.out`myfunc(s=0x00007fffffffe2b0) + 8 at main.cpp:8
> (lldb) fr var s
> (my_untagged_struct *) s = 0x00007fffffffe2b0
> (lldb) fr var *s
> (my_untagged_struct) *s = (i = -7264, f = 0.0000000000000000000000000000000000000000459163468)
> (lldb)
> 
>> 
>> So to sum up: when we parse the DW_TAG_typedef in DWARFASTParserClang::ParseTypeFromDWARF(), we should return a valid TypeSP that contains a valid pointer. If that isn't happening, that is a bug. Feel free to send me the example binary and I can figure things out if you have any trouble. I wrote all of this code so I am quite familiar with it.
>> 
> 
> I've confirmed you're absolutely right about returning a non-null TypeSP after fallthrough in DWARFASTParserClang::ParseTypeFromDWARF, but it seems that with an empty name it doesn't allow clang to resolve the type, failing to locate mangled function as the typename is wrong (_Z6myfuncP3$_0).
> 
> A colleague took a look at this today, and as a quick sanity test, threw together this hack:
> 
> --- a/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
> +++ b/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
> @@ -553,6 +553,19 @@ DWARFASTParserClang::ParseTypeFromDWARF (const SymbolContext& sc,
>                         }
>                     }
> 
> +                    {
> +                        uint32_t list_size = type_list->GetSize();
> +                        for (uint32_t i = 0; i < list_size; ++i)
> +                        {
> +                            TypeSP t = type_list->GetTypeAtIndex(i);
> +                            if (t->IsTypedef())
> +                            {
> +                               type_name_const_str = t->GetName();
> +                               type_name_cstr = t->GetName().AsCString();
> +                            }
> +                        }
> +                    }
> 
> 
> It seems to fix our problem here and expression evaluation works again for the presented case, but unfortunately, a few other tests break, which is a little frustrating. If you have time to take another look at why this might be the case, it'd be very much appreciated.
> 
> I've attached an example Mac binary of this issue in action built with an older Apple clang++, (it's simply the test above) but the result is the same for me on Linux with upstream clang++ and g++5.3, so I don't think the age of the compiler is a problem here.
> 
> Thanks again
> 
> Luke
> 
>> Greg Clayton
>> 
>> 
>>> On Mar 9, 2016, at 3:54 PM, luke Drummond via lldb-dev <lldb-dev at lists.llvm.org> wrote:
>>> 
>>> Hi All
>>> 
>>> I'm hoping that someone might be able to give me some direction
>>> regarding `Type` resolution from DWARF informationfor functions taking
>>> anonymous structs hidden behind a typedef
>>> 
>>> e.g.
>>> 
>>> ```
>>> typedef struct {
>>>    int i;
>>>    float f;
>>> } my_untagged_struct;
>>> 
>>> void __attribute__((noinline)) myfunc(my_untagged_struct *s)
>>> {
>>>    s->i = 0;
>>>    s->f = 3.14f;
>>> }
>>> 
>>> int main()
>>> {
>>>    my_untagged_struct s;
>>>    myfunc(&s);
>>>    return 0;
>>> }
>>> 
>>> ```
>>> 
>>> I [recently reported a
>>> bug](https://llvm.org/bugs/show_bug.cgi?id=26790) relating to the
>>> clang expression evaluator no longer being able to resolve calls to
>>> functions with arguments to typedefed anonymous structs, after a cleanup
>>> to the expression parsing code.
>>> I was perfectly wrong in my assumptions about the cause of the bug, and
>>> after some more digging, I think I've tracked it down to a section of
>>> code in `DWARFASTParserClang::ParseTypeFromDWARF`.
>>> 
>>> 
>>> (DWARFASTParserClang::ParseTypeFromDwarf:254)
>>> ```
>>> switch (tag)
>>> {
>>>    case DW_TAG_typedef:
>>>        // Try to parse a typedef from the DWO file first as modules
>>>        // can contain typedef'ed structures that have no names like:
>>>        //
>>>        //  typedef struct { int a; } Foo;
>>>        //
>>>        // In this case we will have a structure with no name and a
>>>        // typedef named "Foo" that points to this unnamed structure.
>>>        // The name in the typedef is the only identifier for the
>>> struct, // so always try to get typedefs from DWO files if possible.
>>>        //
>>>        // The type_sp returned will be empty if the typedef doesn't
>>> exist // in a DWO file, so it is cheap to call this function just to
>>> check. //
>>>        // If we don't do this we end up creating a TypeSP that says
>>> this // is a typedef to type 0x123 (the DW_AT_type value would be 0x123
>>>        // in the DW_TAG_typedef), and this is the unnamed structure
>>> type. // We will have a hard time tracking down an unnammed structure
>>>        // type in the module DWO file, so we make sure we don't get
>>> into // this situation by always resolving typedefs from the DWO file.
>>>        type_sp = ParseTypeFromDWO(die, log);
>>>        if (type_sp)
>>>            return type_sp;
>>>    LLVM_FALLTHROUGH
>>> ```
>>> 
>>> In my case, the type information for the typedef is included within the
>>> main executable's DWARF rather than an external .dwo file (snippet from
>>> the DWARF included the end of this message), and therefore the `case`
>>> for `DW_TAG_typedef` falls through as `ParseTypeFromDWO` returns a NULL
>>> value.
>>> 
>>> 
>>> As this is code I'm not familiar with, I'd appreciate if any one on the
>>> list was able to give some guidance as to the best way to resolve this
>>> issue, so that `ClangExpressionDeclMap::FindExternalVisibleDecls` can
>>> correctly resolve calls to functions taking typedef names to anonymous
>>> structs. I'm happy to take a whack at implementing this feature, but
>>> I'm a bit stuck as to how to resolve this type given the current DIE
>>> object.
>>> 
>>> Any help or guidance on where to start with this would be really
>>> helpful.
>>> 
>>> All the best
>>> 
>>> Luke
>>> 
>>> 
>>> 
>>> 
>>> --------
>>> This is a snippet from the output of llvm-dwarfdump on the above code
>>> example.
>>> 
>>> `g++ -g main.cpp && llvm-dwarfdump a.out | grep DW_TAG_typedef -A 35`
>>> --------
>>> 
>>> 0x0000005c:   DW_TAG_typedef [6]
>>>                DW_AT_name [DW_FORM_strp]
>>> ( .debug_str[0x00000069] = "my_untagged_struct") DW_AT_decl_file
>>> [DW_FORM_data1]	("/home/luke/main.cpp") DW_AT_decl_line
>>> [DW_FORM_data1]	(4) DW_AT_type [DW_FORM_ref4]	(cu +
>>> 0x002d => {0x0000002d})
>>> 
>>> 0x00000067:   DW_TAG_subprogram [7] *
>>>                DW_AT_external [DW_FORM_flag_present]	(true)
>>>                DW_AT_name [DW_FORM_strp]
>>> ( .debug_str[0x00000006] = "myfunc") DW_AT_decl_file
>>> [DW_FORM_data1]	("/home/luke/main.cpp") DW_AT_decl_line
>>> [DW_FORM_data1]	(6) DW_AT_linkage_name [DW_FORM_strp]
>>> ( .debug_str[0x0000005d] = "_Z6myfuncP18my_untagged_struct")
>>> DW_AT_low_pc [DW_FORM_addr]	(0x0000000000400566) DW_AT_high_pc
>>> [DW_FORM_data8]	(0x0000000000000026) DW_AT_frame_base
>>> [DW_FORM_exprloc]	(<0x1> 9c ) DW_AT_Unknown_2117
>>> [DW_FORM_flag_present]	(true) DW_AT_sibling
>>> [DW_FORM_ref4]	(cu + 0x0095 => {0x00000095})
>>> 
>>> 0x00000088:     DW_TAG_formal_parameter [8]
>>>                  DW_AT_name [DW_FORM_string]	("s")
>>>                  DW_AT_decl_file [DW_FORM_data1]
>>> ("/home/luke/main.cpp") DW_AT_decl_line [DW_FORM_data1]	(6)
>>>                  DW_AT_type [DW_FORM_ref4]	(cu + 0x0095 =>
>>> {0x00000095}) DW_AT_location [DW_FORM_exprloc]	(<0x2> 91 68 )
>>> 
>>> 0x00000094:     NULL
>>> 
>>> 0x00000095:   DW_TAG_pointer_type [9]
>>>                DW_AT_byte_size [DW_FORM_data1]	(0x08)
>>>                DW_AT_type [DW_FORM_ref4]	(cu + 0x005c =>
>>> {0x0000005c})
>>> 
>>> _______________________________________________
>>> lldb-dev mailing list
>>> lldb-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>> 
> <mac-expr-anon-struct-example.tar.gz>



More information about the lldb-dev mailing list