[llvm-dev] DWARF: Should type units be referenced by signature or declaration?

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Fri Feb 3 19:15:53 PST 2017


Bunch of initially unrelated context:

* type units can be referenced in a variety of ways:
  * DW_FORM_ref_sig8 on any attribute needing to reference the type
  * DW_AT_signature on a declaration of the type
  * extra wrinkle: the declaration can be nested into the appropriate
namespace and given a name, or not
  * LLVM always does the "most expressive"/expensive thing: a full
declaration (though without a name, but with the DW_AT_signature) in the
correct namespace.
  * GCC is more selective/nuanced in its choice fo representation,
depending on context.
* Types may be emitted unreferenced (LLVM's retained types list, which will
be more strongly leveraged for C++ modules + debug info in the near future)
into type units, or directly into the CU
* Types that reference addresses (pointer non-type template parameters, for
example) may not be in type units when using Fission (they have no way to
reference the address pool)
  * The LLVM implementation of this isn't terribly efficient - a flag is
lowered on the address pool, if at any point an address is required the
flag is raised and all subsequent type creation is skipped, once control
returns to the code responsible for creating the type unit, the flag is
examined and if it is up - all the work is thrown out, and the type is then
created in the CU.
* Type units have some overhead (2x on GCC, 1.5x on Clang (as measured by
the difference between the reduction in debug_info size compared to the
increase in debug_type size) when I measured a while ago)
* LLVM uses the mangled name of the type as the deduplication key for type
units
  * because of this, LLVM doesn't produce type units for non-public types
(eg: classes in anonymous namespaces - or unnamed enums... (this latter one
produces some wrinkles))

Motivation:
* Types that are only emitted once across the program (eg: attached to a
template explicit instantiation definition or emitted due to a strong
vtable) shouldn't be put into type units so they don't pay the overhead.

Issues:
* This leads to type unit types referencing non-type unit types - what
DWARF should be used for that? a type declaration in the type unit? I
think: yes
* This issue sort of already comes up & is punted if the ODR is violated.
If an external type references an internal type, the internal type is
emitted into the type unit (& into any other TU/CU that uses it - much
duplication)
* If type units may reference other types by declaration (already true - a
type may only be available as a declaration) - why not referencing all
types by declaration?
  * Is there substantial benefit to the debugger to not have to do name
resolution, but rather to match types by signature directly?
* Since type units can be emitted without an reference to them from the CU,
a consumer can't rely on reachability of the type unit reference graph so
this should be only a performance concern, not a correctness one.
* If declarations are used selectively or pervasively, this would help
address pool issue too: even if a type uses an address, it would go in the
CU but types referencing that type could still remain in a TU.

So, barring anything else, I'm sort of inclined to just make all references
to types in type units plain declarations (oh, also, DW_AT_declaration +
DW_AT_name is smaller than DW_AT_declaration + DW_AT_signature (4 bytes
instead of 8)). Simpler implementation, possible performance loss for the
debugger (lacking the shortcut to find a type by signature instead of name
lookup) and should tidy up a bunch of oddities as well as paving the way
for improvements around types that don't need type units.

Any thoughts/suggestions/(dis)recommendations?

- Dave

Bonus question: it's possible that the type-with-addresses issue could be
checked up front (the DICompositeType could be examined for all its
template parameters to see if any involve addresses of globals) but that
seems a little brittle (other uses of addresses could crop up - like some
IR producer could create a member function declaration in the member list
for a member function template instantiation (Clang doesn't do this -
member function template instantiations refer to the class as their scope,
but do not appear in the member list - this keeps types uniform across
translation units), for example) but could simplify the implementation in
terms of not needing to do a bunch of (now much less if all the
intermediate types don't need to be thrown out too) work that may be thrown
out. Worth it? Other ideas?
(also: GCC doesn't implement this rule, so its Fission+type units should
have trouble resolving addresses & may end up referring to the wrong
address pool, etc)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170204/720408db/attachment.html>


More information about the llvm-dev mailing list