[Lldb-commits] [PATCH] D32167: Add support for type units (.debug_types) to LLDB in a way that is compatible with DWARF 5

Jan Kratochvil via Phabricator via lldb-commits lldb-commits at lists.llvm.org
Tue Apr 10 14:13:48 PDT 2018

jankratochvil added a comment.

In https://reviews.llvm.org/D32167#1062618, @clayborg wrote:

> we should be using the AST importer to import the type from that file into the AST for the current DWARF file. We do have done this with -gmodules already, so DWZ shouldn't be that different.

Is AST usable for imported declarations of `DW_TAG_variable`, `DW_TAG_subprogram`, `DW_TAG_enumeration_type`+`DW_TAG_enumerator`s etc.? Still imported `DW_TAG_partial_unit` cannot contain any code or data addresses which may make it easier for AST.

Comment at: source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp:595
+      uint64_t debug_info_size = get_debug_info_data().GetByteSize();
+      data_segment.m_data.OffsetData(debug_info_size);
+    }
clayborg wrote:
> jankratochvil wrote:
> > I do not like this `DWARFDataExtractor::m_start` modification, it sort of corrupts the `DataExtractor` and various operations stop working then - such as `DWARFDataExtractor::GetByteSize()`. DWZ patch makes from current `dw_offset_t` a virtual (remapped) offset and introduces new physical file section offset which is looked up for data extraction. The file offset is represented as `DWARFFileOffset` in D40474, instead of `bool m_is_dwz;` there could be some `enum { DEBUG_INFO, DEBUG_TYPES, DWZ_DEBUG_INFO } m_where;` instead.
> This means that this diff doesn't affect all of the other DWARF code. Nothing in .debug_types will refer to anything else (not DW_FORM_ref_addr, or any external references). So this trick allows us to just treat .debug_info as if .debug_types was appended to the end since nothing in .debug_types refers to any DIE outside of its type unit. This also mirrors what will actually happen with DWARF5 when all of the data is contained inside of the .debug_info section. This allows each DIE to have a unique "ID". Any other change requires a lot of changes to the DWARF parser and logic. So I actually like this feature. We can fix the GetByteSize() if needed. Basically every object in DWARf right now must be able to make a unique 64 bit unsigned integer ID in some way that we can get back to that info and partially parse more. These are handed out as lldb::user_id_t values for types, functions and more. Each flavor of DWARF will encode what they want into here. The normal DWARF it is just the absolute offset within the .debug_info. With .debug_types we just add the size of the .debug_info to the ID. For DWARF in .o files on Darwin, we encode the compile unit index into the top 32 bits and the DIE offset into the lower, DWO does something just as DWZ will need to. DWARFFileOffset doesn't mean much if there are multiple files. We have many competing type uniquing/debug info size reduction strategies being employed here. I can't believe we have DWO, DWZ, and debug types... But we have to make them all work. We can't just use the absolute file offset because DWO used external files where the file offsets could be the same in the external .o files... Not sure how this works with DWZ or what the best option is. I will read up on DWZ so I can propose some viable options. But each new flavor of the day that gets added the DWARF parser is adding a bunch of logic and edge cases. If two technologies (DWZ + DWO, DWZ + debug_types, etc) are used together, we need to ensure they can.
> Any other change requires a lot of changes to the DWARF parser and logic. So I actually like this feature.

I agree it is a fine quick&dirty hack. Just if my DWZ support gets accepted later anyway then this `.debug_types` feature could be implemented by its framework in a clean way (as a regular DIEs remapping which is required for DWZ anyway).

> If two technologies (DWZ + DWO, DWZ + debug_types, etc) are used together, we need to ensure they can.

`DWZ + DWO` do not make sense to me.  I haven't tried to use DWZ for DWO but DWZ finds common DWARF subtrees typically across CUs so it would not find much.

`DWZ + debug_types` is explicitly supported by the DWZ tool although that is IMO for compatibility only, DWZ can make the common type references slightly smaller than debug_types. I will sure need to implement debug_types support into my DWZ-for-LLDB patchset later.


More information about the lldb-commits mailing list