[PATCH] D28040: Don't store the NULL DIEs in the DIE array in the DWARFUnit.

Greg Clayton via llvm-commits llvm-commits at lists.llvm.org
Thu Dec 22 11:18:56 PST 2016


> On Dec 22, 2016, at 9:59 AM, David Blaikie <dblaikie at gmail.com> wrote:
> 
> 
> 
> On Thu, Dec 22, 2016 at 9:04 AM Greg Clayton via Phabricator <reviews at reviews.llvm.org> wrote:
> clayborg added a comment.
> 
> Added inline comments and responses. New patch coming with fixes.
> 
> 
> 
> ================
> Comment at: include/llvm/DebugInfo/DWARF/DWARFDebugInfoEntry.h:39
> +  /// NULL DIEs are now removed when storing DWARFDebugInfoEntry
> +  uint32_t HasNullSibling : 1;
> 
> ----------------
> dblaikie wrote:
> > This should be a bool.
> If you change the type it won't be part of the uint32_t for HasNullSibling. I compiled a quick example:
> 
> ```
> #include <stdint.h>
> 
> struct Test {
>   uint32_t Depth : 31;
>   bool IsLastChild : 1;
> };
> 
> int main (int argc, char const *argv[])
> {
>   Test t = { 123, false };
>   if (t.Depth)
>     return 0;
>   return 1;
> }
> ```
> 
> And I got the following DWARF:
> ```
> 0x000000b0:     TAG_structure_type [8] *
>                  AT_name( "Test" )
>                  AT_byte_size( 0x04 )
>                  AT_decl_file( "/private/tmp/main.c" )
>                  AT_decl_line( 3 )
> 
> 0x000000b8:         TAG_member [9]
>                      AT_name( "Depth" )
>                      AT_type( {0x00000000000000e3} ( uint32_t ) )
>                      AT_decl_file( "/private/tmp/main.c" )
>                      AT_decl_line( 4 )
>                      AT_byte_size( 0x04 )
>                      AT_bit_size( 0x1f )
>                      AT_bit_offset( 0x01 )
>                      AT_data_member_location( +0 )
> 
> 0x000000cd:         TAG_member [9]
>                      AT_name( "IsLastChild" )
>                      AT_type( {0x00000000000000f9} ( bool ) )
>                      AT_decl_file( "/private/tmp/main.c" )
>                      AT_decl_line( 5 )
>                      AT_byte_size( 0x01 )
>                      AT_bit_size( 0x01 )
>                      AT_bit_offset( 0x00 )
>                      AT_data_member_location( +3 )
> 
> 0x000000e2:         NULL
> ```
> Note that IsLastChild has a AT_data_member_location that is +3 so it is in its own block. The type has to be the same in order to share storage. 
> 
> I think you might be misreading the DWARF (& it looks like the DWARF is a bit confusing, as well).
> 
> For starters - in both your examples, the byte_size of the struct is 4 bytes. So it seems storage is shared, or at least that using the same type doesn't improve/reduce the size of the struct - so on that basis I'd probably suggest using the bool type.
> 
> Beyond that, though. The AT_data_member_location is +3, as you say - not +4, so it is still using the top (or bottom - the counting here is a bit weird imho, but seems to work out) byte of the previous 4 byte value.
> 
> Also - we have /lots/ of similar mixed type bitfields in llvm, I think (have a grep around - it looks like it to me, at least) - so if this isn't doing the right thing I'd be /really/ surprised. 

You are correct. I misread the DWARF. I will change it to a bool.

>  
> Changing IsLastChild to a uint32_t we see things are right:
> 
> ```
> 0x000000b0:     TAG_structure_type [8] *
>                  AT_name( "Test" )
>                  AT_byte_size( 0x04 )
>                  AT_decl_file( "/private/tmp/main.c" )
>                  AT_decl_line( 3 )
> 
> 0x000000b8:         TAG_member [9]
>                      AT_name( "Depth" )
>                      AT_type( {0x00000000000000e3} ( uint32_t ) )
>                      AT_decl_file( "/private/tmp/main.c" )
>                      AT_decl_line( 4 )
>                      AT_byte_size( 0x04 )
>                      AT_bit_size( 0x1f )
>                      AT_bit_offset( 0x01 )
>                      AT_data_member_location( +0 )
> 
> 0x000000cd:         TAG_member [9]
>                      AT_name( "IsLastChild" )
>                      AT_type( {0x00000000000000e3} ( uint32_t ) )
>                      AT_decl_file( "/private/tmp/main.c" )
>                      AT_decl_line( 5 )
>                      AT_byte_size( 0x04 )
>                      AT_bit_size( 0x01 )
>                      AT_bit_offset( 0x00 )
>                      AT_data_member_location( +0 )
> 
> 0x000000e2:         NULL
> ```
> 
> 
> ================
> Comment at: lib/DebugInfo/DWARF/DWARFDie.cpp:359
>    if (!isValid())
> -    return;
> +    return -1U;
>    DataExtractor debug_info_data = U->getDebugInfoExtractor();
> ----------------
> aprantl wrote:
> > Is there a better alternative to returning a magic value?
> We should probably just assert this object is valid. Clients should check prior to calling.
> 
> 
> ================
> Comment at: lib/DebugInfo/DWARF/DWARFDie.cpp:398-402
> +          WithColor(OS, syntax::Address).get() << format("\n0x%8.8x: ", Offset);
> +          OS.indent(Indent) << "NULL\n";
> +          // Add 1 to the offset for the ULEB128 abbreviation code byte.
> +          Offset += 1;
> +        }
> ----------------
> I believe this does belong here. A DIE could claim to have children but only have a NULL inside of it, then we would fail to print the NULL if we try to do it with children at line 392.
> 
> How would we succeed to print the NULL if we aren't printing children? There would be no 'last child' & so we wouldn't print the NULL in that case either.

The NULL here isn't a NULL terminating the children, it is terminating the current sibling list. and we would want to see that NULL.

So if you tell the CU DIE to print itself and have a recurse depth of 1 for DWARF like:

0x0: CU
0x1:  A
0x2    A1
0x3:   NULL
0x4:  B
0x5:   B1 
0x6:   NULL
0x7:  NULL

you would expect to see:

0x0: CU
0x1:  A
0x4:  B
0x7:  NULL



> You're right though, we wouldn't print it in 392, but perhaps we should change the condition here from "if (Die->isLastChild())" to "if (Die->hasChildren())" ?

I think the misunderstanding here is that the NULL is actually terminating the current DIEs sibling chain, not the children's sibling chain. If we use "if (Die->hasChildren())" then we would print a NULL after every Die that has children, and since the NULL is for the current DIEs sibling, not the children, that would do the wrong thing:

0x0: CU
0x1:  A
0x2:  NULL // these would be incorrect
0x4:  B
0x5:  NULL // these would be incorrect
0x7:  NULL

> 
> Also, a test case for an empty child list would be good.

Agreed, I can add it.
 
> Also, if RecurseDepth is zero, we still need to print the NULL die since it is at the current recurse level. Given these reasons, it seems that we still need the getHasNULLSibling() property and this code belongs where it currently is.
> 
> 
> ================
> Comment at: tools/dsymutil/DwarfLinker.cpp:1799
>    if (DIE.hasChildren())
> -    for (auto Child = DIE.getFirstChild(); Child && !Child.isNULL();
> -         Child = Child.getSibling())
> +    for (auto Child = DIE.getFirstChild(); Child; Child = Child.getSibling())
>        Info.Prune &= analyzeContextInfo(Child, MyIdx, CU, CurrentDeclContext,
> ----------------
> aprantl wrote:
> > Might make sense to add a method that returns something like a SiblingIterator at some point in the future?
> I agree. A later commit we can add a function like:
> 
> ```
>   DieIteratorRange DWARFDie::Children();
> ```
> 
> So we can then so:
> 
> ```
> for (auto Child: DIE.Children()) {
> ```
> 
> 
> 
> ================
> Comment at: unittests/DebugInfo/DWARF/DWARFDebugInfoTest.cpp:1043
>    EXPECT_EQ(B.getParent(), CUDie);
> -  EXPECT_EQ(Null.getParent(), CUDie);
> +  EXPECT_EQ(C.getParent(), CUDie);
> 
> ----------------
> I argue this one is needed. A was the first child of CU, B was the second child after A had no children, and C is the third child that follows B which had children. I do think all are valid tests that we can't regress on.
> 
> Just to be clear - I'm never suggesting there are test cases we "can regress on", just asking about whether the test cases pull their weight - whether they exercise sufficiently distinct/interesting codepaths that make them important tests compared to the existing coverage.

Understood.

>  
> This is testing the DWARF parser and making sure it they can track down their relations correctly. If we significantly change DWARFDebugInfoEntry at any point, we should verify everything still works.
> 
> Knowing the current algorithm to find the parent when we find the parent for A the depth decreases only as you walk backwards. The depth for B does, but it also has A at the same depth. The depth for C will go up when it sees the children for B and then go back down again, so I do believe all these tests are valid.
> 
> It looks like C is perhaps a superset - it demonstrates we can go down into children, and back up. But, yes, good point that A and B are boundary/corner cases that are probably reasonable to test for separately.
> 
> Please add comments describing what makes each of these paths interesting. (eg: "// verify an immediate parent", "// verify skipping preceding siblings", "// verify skipping preceding niblings" (luls: https://en.wiktionary.org/wiki/nibling ) - those probably aren't the perfect comments, but a rough idea)
>  

Will do.

> ================
> Comment at: unittests/DebugInfo/DWARF/DWARFDebugInfoTest.cpp:1050
>    auto B2 = B1.getSibling();
> -  EXPECT_TRUE(B2.getSibling().isNULL());
> +  EXPECT_FALSE(B2.getSibling().isValid());
> 
> ----------------
> aprantl wrote:
> > Should we introduce something like a hasSibling() method? This almost sounds like it has a corrupted sibling DIE.
> Now when we ask for a sibling and it doesn't have one we get an invalid DWARFDie back so this is the new way to check for that. I am not sure adding a hasSibling() method will make the code any clearer? If you wrote a loop using the current code, it currently looks like:
> 
> ```
>   for (auto Child = Die.getFirstChild(); Child; Child = Child.getSibling())
> ```
> 
> We the "Child;" test is using the operator bool. If we add the hasSibling() would code actually use it? Does this look clearer:
> 
> ```
>   for (auto Child = Die.getFirstChild(); Child; Child = Child.hasSibling() ? Child.getSibling() : DWARFDie())
> ```
> 
> So if we introduced it, who would actually use it? We could switch the code to use operator bool:
> 
> ```
>   EXPECT_FALSE((bool)B2.getSibling());
> ```
> 
> But I am not sure that makes things clearer.
> 
> *nod* don't think it's a big deal right now, and I think I'm OK (but see what Adrian has to say) with this null DIE being a valid return value.
> 
> It'll be all easier once iterators/ranges are added (but, no, I wouldn't add them in this commit).
>  
> 
> 
> https://reviews.llvm.org/D28040



More information about the llvm-commits mailing list