[llvm-dev] [DWARF] using simplified template names

David Blaikie via llvm-dev llvm-dev at lists.llvm.org
Mon Jun 7 16:44:08 PDT 2021


On Mon, Jun 7, 2021 at 4:29 PM Adrian Prantl <aprantl at apple.com> wrote:

>
>
> On Jun 4, 2021, at 6:33 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
> tl;dr: What if we used only the base name of templates in the DW_AT_name
> field for function and class templates (eg: "vector" instead of
> "vector<int, std::allocator<int>>")?
>
> Context:
> We (at Google) have been seeing some significant DWARF growth in binaries
> lately due to increased use of libraries like Eigen and TensorFlow that use
> expression templates.
>
> This includes some cases where the debug_str.dwo section has exceeded the
> DWARF32 limit (& the binutils dwp tool silently wrote overflowed indexes
> into the debug_str_offsets.dwo section, unfortunately - leading to
> corrupted/garbled names in backtraces) & most of the growth is from the
> demangled names of complicated/large expression templates.
>
> Options:
> One solution would be to move to DWARF64 - though that does make DWARF
> overall larger, which is an unfortunate cost that would be nice to avoid.
>
> Another might be to rely solely on linkage names (add linkage names to
> types), since mangled names generally reduce a lot of the duplication -
> though in some cases it's not a matter of duplication within a single name,
> but possibly many distinct types used as template parameters - though those
> types may also be used in other names (& mangled names have no sharing
> across names).
>
>
> Without having measured this, I find it plausible to believe that the
> DWARF DIE tree together with base names can be more compact than linkage
> names (=mangled type names) on every DIE because of the sharing within more
> complex types.
>
>
> Compression doesn't help, since the offsets are into the uncompressed data.
>
> Main idea:
> What if templates instead only encoded the base name, such as "vector"
> (rather than "vector<int, std::allocator<int>>")? The full name could still
> be reconstructed from the DW_TAG_template_type_parameters (non-type
> template parameters would be more difficult, and we'd need to add template
> parameters to template declarations - functionality we have, but is only
> enabled for SCE today)).
>
> This could significantly reduce debug info size (in some worst-cases I've
> seen this lead to a 50% reduction in the uncompressed size of
> .debug_str.dwo in a dwp file, for instance - probably less exciting if the
> data was compressed - but gives a sense of the headroom available before
> this limit will be reached again).
>
> Also has the nice property that it's not a new format or encoding that
> might break existing consumers immediately (DWARF64, for instance isn't
> widely implemented to my knowledge, so many consumers would need to be
> fixed before they could parse any of it) - if a consumer doesn't know,
> it'll still see a name, just not the most fully descriptive/specific name
> it could be. For a symbolizer this is probably fairly low cost - users
> would find it more difficult, but not totally useless to get a simple
> template function name.
>
> As it happens, it seems GDB is already built to cope with this situation -
> it can print the real name of the type and can even correctly match up two
> distinct type declarations between translation units by correctly matching
> their template parameters.
>
> GDB Example:
> a.h:
>
> template<typename T>
> struct t1 { T t = sizeof(T); };
> void f(t1<int> &p1, t1<short> *&p2);
>
> a.cpp:
>
> #include "a.h"
> int main() {
>   t1<int> v1;
>   t1<short> *v2 = nullptr;
>   t1<bool> *v3 = nullptr;
>   f(v1, v2);
> }
>
> b.cpp:
>
> #include "a.h"
> void f(t1<int> &p1, t1<short> *&p2) {
>   static t1<short> v2;
>   p2 = &v2;
> }
>
>
> // using a clang modified to produce simple template names, and
> // to include template parameters on declarations
> // (-Xclang -debug-forward-template-params)
> $ clang++ a.cpp b.cpp -g
> $ llvm-dwarfdump a.out (glossing over some details)
> DW_TAG_compile_unit
>   DW_AT_name    ("a.cpp")
>   DW_TAG_structure_type
>     DW_AT_name  ("t1")
>     DW_TAG_template_type_parameter
>       DW_AT_type        (0x00000098 "int")
>       DW_AT_name        ("T")
>     DW_TAG_member
>       DW_AT_name        ("t")
>
>       DW_AT_type        (0x00000098 "int")
>   DW_TAG_structure_type
>     DW_AT_name  ("t1")
>     DW_AT_declaration   (true)
>     DW_TAG_template_type_parameter
>       DW_AT_type        (0x000000e2 "short")
>
>       DW_AT_name        ("T")
>   DW_TAG_structure_type
>     DW_AT_name  ("t1")
>     DW_AT_declaration   (true)
>     DW_TAG_template_type_parameter
>       DW_AT_type        (0x000000fd "bool")
>
>       DW_AT_name        ("T")
>
> DW_TAG_compile_unit
>   DW_AT_name    ("b.cpp")
>   DW_TAG_structure_type
>     DW_AT_name  ("t1")
>     DW_TAG_template_type_parameter
>       DW_AT_type        (0x0000019e "short")
>       DW_AT_name        ("T")
>     DW_TAG_member
>       DW_AT_name        ("t")
>
>       DW_AT_type        (0x0000019e "short")
>   DW_TAG_structure_type
>     DW_AT_name  ("t1")
>     DW_AT_declaration   (true)
>     DW_TAG_template_type_parameter
>       DW_AT_type        (0x000001b9 "int")
>
>       DW_AT_name        ("T")
> $ gdb ./a.out
>
> (gdb) start
> (gdb) ptype v1
> type = struct t1<int> [with T = int] {
>     T t;
> }
> (gdb) ptype v2
> type = struct t1<short> [with T = short] {
>     T t;
> } *
> (gdb) ptype v3
> type = struct t1<bool> {
>     <incomplete type>
> } *
> (gdb) ptype v1.t
> type = int
> (gdb) ptype v2->t
> type = short
> (gdb) ptype v3->t
>
> There is no member named t.
>
>
>
> So in this example we have one instantiation (t1<int>) declared in the
> first CU and defined in the second, one instantation (t1<short>) declared
> in the first and defined in the second, and a third instantiation
> (t1<bool>) declared in the first and not defined anywhere.
>
> GDB has correctly rendered the type names, despite lacking the template
> parameter lists being in the DW_AT_name - and has correctly associated the
> definitions with the declarations despite the DW_AT_name being ambiguous,
> by using the DW_TAG_template_type_parameters.
>
> lldb doesn't cope with this sort of DWARF currently - it has a bunch of
> assumptions about the names of template instantiations that'll need to be
> fixed before it can consume this sort of thing.
>
>
> I'm pretty this will break at least some workflows in LLDB,
>

I mean I know it currently breaks lldb in a bunch of ways - when you say
"workflows" do you mean more fundamental things than bugs? (like features
that could not be built, or would be sort of fundamentally difficult to
build, with this proposed alternative format)


> but perhaps not necessarily the most useful ones. LLDB will search types
> by name in many situations, but the fact that template types can be
> formatted in many different ways and may contain whitespace makes this
> process brittle already.
>

Presumably lldb already has to deal with some ambiguity here, since those
names don't include the namespace, for instance, in the name?


> In order to support currently supported workflows we may need to implement
> a type lookup where we stri out everything but the basename in the searched
> type, then do a by-(base)name lookup, and then filter for template
> arguments. From afar this sounds doable, but we should make sure not to
> enable this debug info optimization without qualifying it in LLDB first.
>

Fair - I was thinking worst case can go the other way too: Everywhere lldb
currently looks at the name, it could check if the name was "simple" (are
there DW_TAG_template_parameters and no angle brackets in the DW_AT_name?)
then produce the full name string by checking the template parameter DIEs,
etc - then use that as before/without this feature. But yeah, either way -
strip everything down, or build everything up.

- Dave



>
> -- adrian
>
>
> I haven't tested a wide number of symbolizers, but I assume they'll
> generally need some work too.
>
> So... how's this sound to everyone? An idea worth pursuing?
> Concerns/questions/etc.
>
> I don't expect this to become the default for LLVM in the short term at
> least - but under a flag for those whose consumers can handle it (/maybe/
> we do it under debugger tuning for gdb, since it seems OK with it - but
> that might be a bit stronger than we want to do under the default tuning,
> since it's really broken for lldb, not just a little bit broken).
>
> - Dave
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210607/e61fff59/attachment.html>


More information about the llvm-dev mailing list