[llvm] r187213 - Add preliminary support for hashing DIEs and breaking them into

Mon Jul 29 10:37:05 PDT 2013

>>>>> +  // For each surrounding type or namespace...
>>>>
>>>> What surrounding things can we expect that aren't namespaces and types?
>>>
>>> The compile unit? But as you ask later, this is actually a quote.
>>
>> For myself it wasn't obvious what the condition was avoiding - perhaps
>> it'd be helpful. If it's only the compile_unit, that should always be
>> the top level, so shouldn't we just be able to replace this condition
>> and the next one with "if it's not a compile unit, recurse" (or loop,
>> if we do it with a loop instead) - with an assert that it should be a
>> namespace, class, or struct? That seems a bit more
>> explicit/self-documenting.
>
> Oh, also functions. Missed that one in my description to you as well
> which makes it more complicated. Sorry about that.

Hmm - skipping past function scopes? That seems strange. I would
expect that either we never ODR hash function-local types or, if we do
so, we'd want/need to include the function in the hash.

Again, I think I'd find this more legible if it were a bit more
explicit about what it's doing.

>>>> I'd sort of rather a loop than a recursion, though I realize the
>>>> recursion won't be terribly deep.
>>>>
>>>
>>> Heh. I'd originally written it that way
>>
>> Any particular reason you chose to rewrite it recursively?
>
> Made it more compact - though not terribly. I can undo it if you'd like.

I kind of would - while I'm not one to throw stones about writing
overly compact code, it is a bit unobvious when glancing at the code
that it's looping, just recursively.

>> If we can have different ODR hashes, then can we add a FIXME to remove
>> the null terminators once you're satisfied the algorithm is working
>> correctly?
>> (& if we can't have different ODR hashes, can we have a FIXME to fix
>> GCC/DWARF to ensure we don't hash them?)
>
> We can, but I'll caution you against too much reading into things on
> the ODR hashing. It was a small gcc feature that used the hashing that
> it didn't cost much additional to add, I don't want to get bogged down
> on whether or not hashing the null terminator there is the right thing
> to do. It's basically just a strict subset of the whole DIE and whole
> CU hashing and an incremental change. The hashing of the 0 byte is
> actually in the standard and I can get it changed, I'm just not really
> sure it's worth it.

In the type unit/CU hashing standard, you mean? Yeah, as we've
agreed/mentioned - the standard definition of hashing is essentially
irrelevant. At best it's a "recommended implementation", but there
doesn't seem to be any benefit to treating it as a necessary way to
compute the hash of the DIE - I really don't see much reason not to
diverge wherever we see fit.

>>> Not really, and we're definitely going to be grabbing lots of
>>> attributes with string representations as we hash whole DIEs - and
>>> unfortunately it's a sorted list. :\
>>
>> I guess you meant it's not sorted?
>
> The ordering of DIEs to hash has a particular sort. The order that
> they're added to the DIE is not sorted. We could, if we wanted, sort
> things as they're added to the DIE.

Well, once we've got the sorting in that we need for hashing, then
we'll probably be able to do the name lookup for the ODR hash (which
is "after full DIE construction" anyway - so will be at a similar
point to type/CU hashing, I assume... oh, well, a step before (since
you'll need to hash the ODR hash I guess)) we can rely on the sorting.
But, yes, might be easier just to maintain a sorted invariant rather
than a sorted and unsorted dynamic state.

>>>>> +; CHECK: DW_TAG_structure_type
>>>>> +; CHECK-NEXT: debug_str{{.*}}"bar"
>>>>> +; CHECK: DW_AT_GNU_odr_signature [DW_FORM_data8] (0x200520c0d5b90eff)
>>>>> +; CHECK: DW_TAG_namespace
>>>>> +; CHECK-NEXT: debug_str{{.*}}"echidna"
>>>>> +; CHECK: DW_TAG_namespace
>>>>> +; CHECK-NEXT: debug_str{{.*}}"capybara"
>>>>> +; CHECK: DW_TAG_namespace
>>>>> +; CHECK-NEXT: debug_str{{.*}}"mongoose"
>>>>> +; CHECK: DW_TAG_class_type
>>>>> +; CHECK-NEXT: debug_str{{.*}}"fluffy"
>>>>> +; CHECK: DW_AT_GNU_odr_signature [DW_FORM_data8]   (0x9a0124d5a0c21c52)
>>>>> +; CHECK: DW_TAG_structure_type
>>>>> +; CHECK-NEXT: debug_str{{.*}}"baz"
>>>>> +; CHECK-NOT: DW_AT_GNU_odr_signature
>>>>> +; FIXME: we may want to generate debug info for walrus, but still no hash.
>>>>
>>>> Why?
>>>
>>> It's in an anonymous namespace.
>>
>> Again, could be helpful to include that in the comment somehow. That
>> almost seems bug-worthy - we have a global variable of that type yet
>> no debug info for it? That seems strange.
>
> We may wish to emit debug info for w (and the type walrus), we
> definitely don't want to emit a hash. I'm not sure which thing you
> considered bug worthy. The former is, the latter isn't :)

Well this test can't possibly say one way or the other on the latter -
since there's no definition for the variable/type at all. So the only
bug this could demonstrate is that we don't emit the type definition
for 'walrus' (but, yes, it would help to include in the comment
"if/when we do fix PRXXXX, we still shouldn't emit an ODR hash for
it").

- David