[PATCH] D74169: [WIP][LLD][ELF][DebugInfo] Remove obsolete debug info.

Sun Apr 5 11:44:44 PDT 2020

dblaikie added a comment.

Rather than replying point-by-point, since I think the conversation has got a bit jumbled up - and maybe it's all best left to a separate thread than in this review... but:

The initial idea for bag-of-dwarf was to allow a type unit to expose more than one DIE (so CUs could reference nested types, member functions, parameters, etc). Also potentially to let CUs export entities that are known to have one home & that other DWARF might want to reference (this is a bit more of a stretch - but you could imagine a type with a strong vtable, no point putting it in a type unit, so you put it in the CU - but you could, if you wanted, still expose the type and its member functions as referenceable DIEs that could be referenced from other CUs rather than by having to redeclare (though not define) the entity in that other CU - inter-object-file-DIE references).

I see your point somewhat about how type units could integrate with DWARF-aware linking, but perhaps only in a very specific way - At least as LLVM emits type units, the types should always be identical (the type unit should never contain implicit special members or member function template instantiations, etc) & the variable component of a type (those implicit special members, etc) only ever appear on the declaration of the type that references to the type unit. If you knew that was the kind of input you were getting, you could rely on every type unit being identical (well, with GCC you can rely on each type unit being identical, but the same ODR type might be in multiple distinct type units because it uses exact hashing of the type unit contents, and the contents varies with implicit special members, etc)... Anyway, if you could rely on the type units for a given ODR type being identical or at least equivalent (no implicit special members, etc) - you could just use the first one you saw, drop the rest, and any new members you see can just be attached to a type declaration the way Clang does them today... hmm - I guess that'd work with GCC's output too, just ignoring any duplicate type units even though they aren't quite equivalent - they'd be no worse than Clang's type units. (Clang's never contain any of the special members, GCC's contain the used one - so if your type unit ended up being an arbitrarily chosen one from GCC, it wouldn't be any worse than CLang's - it'd just have some, but not all, the special members, member function templates, etc)

Hmm, I guess no reason you couldn't do this without type units though - maybe - depending on how debuggers handle it. Maybe they aren't good with member function declarations in type declarations if those type declarations aren't referencing type units....

eg: existing debug info with type units can look something like this:

  DW_TAG_compile_unit
    DW_TAG_structure_type // DIE: t1.1
      DW_AT_name "t1"
      DW_AT_signature 0x42
      DW_AT_declaration true
    DW_TAG_variable
      DW_AT_type // -> DIE t1.1
  DW_TAG_compile_unit
    DW_TAG_structure_type // DIE: t1.2
      DW_AT_name "t1"
      DW_AT_signature 0x42
      DW_AT_declaration true
      DW_TAG_subprogram // DIE: t1::t1
        DW_AT_name "t1"
        DW_AT_declaration
        DW_AT_artificial true
    DW_TAG_subprogram
      DW_AT_specification // -> DIE t1::t1
      // etc... 
    DW_TAG_variable
      DW_AT_type // -> DIE t1.2

  type unit, signature 0x42, type offset -> DIE t1.tu
  DW_TAG_type_unit
    DW_TAG_structure_type // DIE: t1.tu
      DW_AT_name "t1"
      // line/file/column/size/etc... 
      // let's say there are no explicit members, for the sake of simplicity

  Then, without type units you could emit something very similar
  DW_TAG_compile_unit
    DW_TAG_structure_type // DIE: t1.1
      DW_AT_name "t1"
      // line/file/column/size/etc... 
    DW_TAG_variable
      DW_AT_type // -> DIE t1.1
  DW_TAG_compile_unit
    DW_TAG_structure_type // DIE: t1.2
      DW_AT_name "t1"
      DW_AT_signature 0x42
      DW_AT_declaration true
      DW_TAG_subprogram // DIE: t1::t1
        DW_AT_name "t1"
        DW_AT_declaration
        DW_AT_artificial true
    DW_TAG_subprogram
      DW_AT_specification // -> DIE t1::t1
      // etc... 
    DW_TAG_variable
      DW_AT_type // -> DIE t1.2

  ie: don't go back and try to make t1.1 complete/merge all the contents, just tack things on to CU-local declarations whenever you come across a new member that has a definition in this translation unit. (ie: when the linker keeps the code for t1::t1 alive, then keep the subprogram definition alive, which keeps the subprogram declaration alive, which keeps the t1.2 declaration alive - but otherwise skip it and refer to t1.1 directly (or, in a first approximation, just always keep the declaration - but I guess dsymutil already has the smarts to strip the local type definition out entirely (does it use a declaration to reduce encoding length if there's lots of references to a type? Probably not, I guess))

  This sort of complex design discussion probably merits some in person or video chat discussions or at least lengthy design discussion on llvm-dev.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74169/new/

https://reviews.llvm.org/D74169