[PATCH] D115325: [DWARF] Fix PR51087 Extraneous enum record in DWARF with type units

David Blaikie via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Jan 10 11:17:40 PST 2022


dblaikie added a comment.

In D115325#3224922 <https://reviews.llvm.org/D115325#3224922>, @Orlando wrote:

>> Hrm :/ I'm not super enthusiastic about either direction. I'm starting to wonder whether this general direction is viable - that some DWARF consumers might not find types that are in type units but aren't referenced from the CUs at all.
>>
>> Do you have a measure of the debug info size improvement of this direction? If it doesn't end up counting for much, maybe it's not worth trying to create this possibly novel debug info?
>
> For non-pathological cases the saving appears to be negligible. The CTMark projects all have total debug section size savings of less than 0.1% with this patch applied when built with `-fdebug-types-section -gdwarf-5 ` in RelWithDebInfo configuration. Unfortunately, I don't actually have access to the pathological source that prompted this patch. We (@jmorse) only have a binary built with clang with what looks like a load of redundant DIEs in it and I don't think it's likely that we'll be able to get our hands on that source.

Ah, /perhaps/ from the DWARF you might be able to hypothesize about the nature of the entities & why the code might've produced DWARF like that.

>> Do you know/could you look to see if there's any case where GCC produces type units without skeleton types that reference them? If there are such cases, that'd be reassuring, and may also help inform how we should behave with pubnames etc.
>
> I've just had a look with gcc 7.5 (because it's what I have installed; I can spend more time getting/building a recent version?). Interestingly, GCC 7.5 doesn't produce CU skeleton DIEs at all when type units are enabled, even when types are used in the CU.
>
> Taking some source from my test in this patch, you can see that the TU for `Ex::Enum` (a type that //is// used) is referenced directly by the DW_AT_type by its signature:
>
>   $ cat test.cpp
>   struct Ex { enum Enum { X }; };
>   void fun() { Ex::Enum local; }
>   
>   $ gcc -c -O2 -g -fdebug-types-section -gdwarf-5  test.cpp -o test_gcc
>   $ llvm-dwarfdump -v test_gcc --name local
>   test_gcc:	file format elf64-x86-64
>   
>   0x00000052: DW_TAG_variable [11]   (0x00000035)
>                 DW_AT_name [DW_FORM_strp]	( .debug_str[0x00000028] = "local")
>                 DW_AT_decl_file [DW_FORM_data1]	("/home/och/dev/bugs/scratch/test.cpp")
>                 DW_AT_decl_line [DW_FORM_data1]	(2)
>                 DW_AT_type [DW_FORM_ref_sig8]	(0x22f6e0a3cab4ab3c)
>
> So, gcc appears to omit the CU skeleton DIE entirely when type units are used.

Ah, sorry, there are several ways GCC emits references to type units that I know of (referenced once, it works like you observed, `DW_FORM_ref_sig8` on the `DW_AT_type` - when referenced more than once, a simple type DIE at the CU level, when members need to be emitted (eg: a member function definition) then the type skeleton goes in the correct namespace too), but I'm not sure if GCC ever produces a type description that is otherwise unreferenced from other parts of DWARF... 
(just for the record, here's an example that shows the 3 ways I know of that GCC will emit a reference to a type unit)

  namespace ns {
  struct t1 { void f1(); };
  }
  using namespace ns;
  #ifdef SINGLE
  t1 v1;
  #elif DOUBLE
  t1 v1;
  t1 v2;
  #elif MEMBER
  void t1::f1() { 
  }
  #endif



> That said, llvm-dwarfdump seems unhappy when we throw pubnames into the mix, but I don't know if this is gcc bug or not (see "error" in the output below):
>
>   $ gcc -c -O2 -g -fdebug-types-section -gdwarf-5  test.cpp -o test_gcc_pubnames -gpubnames
>   $ llvm-dwarfdump -v test_gcc_pubnames
>   ...
>   .debug_pubnames contents:
>   length = 0x00000016, format = DWARF32, version = 0x0002, unit_offset = 0x00000000, unit_size = 0x00000063
>   Offset     Name
>   0x00000035 "fun"
>   
>   .debug_pubtypes contents:
>   error: name lookup table at offset 0x0 has a terminator at offset 0x1f before the expected end at 0x26
>   length = 0x00000026, format = DWARF32, version = 0x0002, unit_offset = 0x00000000, unit_size = 0x00000063
>   Offset     Name
>   0x0000002e "unsigned int"

Ah, looks like maybe GCC produces a corrupt pubtypes table /except/ in the "MEMBER" case in the above example code. (in that case it produces a duplicate entry in pubtypes, but at least they're both correct). Given that LLVM /only/ produces the "MEMBER" style type unit reference, that seems OK for LLVM's pubnames support there. But doesn't explain how we should create pubnames for types that don't have skeleton references from a CU.

(I'm guessing what GCC does is it tries to emit a pubtype entry from some internal intermediate object that never actually got added to the CU (because it used one of the more terse representation choices, so didn't emit the full/"real" declaration), so it has empty values and corrupts the list)

>> & have you tested this sort of DWARF (TUs without skeleton type references) with GDB and LLDB, do they seem to find the types well?
>
> Just checked and LLDB (trunk) appears to be able to find types that are only described by TUs (not referenced in the CU at all). i.e. AFAICT it can handle the output of this patch. I don't have a recent GDB installed (the version I have doesn't seem to be able to handle type units at all).
>
> Looking at the DWARFv5 spec while trying to get my head round pubnames, I see that there's a v5 specific section that combines pubnames and pubtypes called debug_names. Sorry if this is a silly question - this part of DWARF is new to me - does clang support emitting debug_names?

Yeah, as @probinson mentioned, `-gdwarf-5 -gpubnames` should give you DWARFv5 `.debug_names`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115325/new/

https://reviews.llvm.org/D115325



More information about the llvm-commits mailing list