[Lldb-commits] [PATCH] D32167: Add support for type units (.debug_types) to LLDB in a way that is compatible with DWARF 5

Tue Apr 10 14:46:30 PDT 2018

jankratochvil added inline comments.

================
Comment at: source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp:595
+      uint64_t debug_info_size = get_debug_info_data().GetByteSize();
+      data_segment.m_data.OffsetData(debug_info_size);
+    }
----------------
jankratochvil wrote:
> clayborg wrote:
> > jankratochvil wrote:
> > > I do not like this `DWARFDataExtractor::m_start` modification, it sort of corrupts the `DataExtractor` and various operations stop working then - such as `DWARFDataExtractor::GetByteSize()`. DWZ patch makes from current `dw_offset_t` a virtual (remapped) offset and introduces new physical file section offset which is looked up for data extraction. The file offset is represented as `DWARFFileOffset` in D40474, instead of `bool m_is_dwz;` there could be some `enum { DEBUG_INFO, DEBUG_TYPES, DWZ_DEBUG_INFO } m_where;` instead.
> > This means that this diff doesn't affect all of the other DWARF code. Nothing in .debug_types will refer to anything else (not DW_FORM_ref_addr, or any external references). So this trick allows us to just treat .debug_info as if .debug_types was appended to the end since nothing in .debug_types refers to any DIE outside of its type unit. This also mirrors what will actually happen with DWARF5 when all of the data is contained inside of the .debug_info section. This allows each DIE to have a unique "ID". Any other change requires a lot of changes to the DWARF parser and logic. So I actually like this feature. We can fix the GetByteSize() if needed. Basically every object in DWARf right now must be able to make a unique 64 bit unsigned integer ID in some way that we can get back to that info and partially parse more. These are handed out as lldb::user_id_t values for types, functions and more. Each flavor of DWARF will encode what they want into here. The normal DWARF it is just the absolute offset within the .debug_info. With .debug_types we just add the size of the .debug_info to the ID. For DWARF in .o files on Darwin, we encode the compile unit index into the top 32 bits and the DIE offset into the lower, DWO does something just as DWZ will need to. DWARFFileOffset doesn't mean much if there are multiple files. We have many competing type uniquing/debug info size reduction strategies being employed here. I can't believe we have DWO, DWZ, and debug types... But we have to make them all work. We can't just use the absolute file offset because DWO used external files where the file offsets could be the same in the external .o files... Not sure how this works with DWZ or what the best option is. I will read up on DWZ so I can propose some viable options. But each new flavor of the day that gets added the DWARF parser is adding a bunch of logic and edge cases. If two technologies (DWZ + DWO, DWZ + debug_types, etc) are used together, we need to ensure they can.
> > Any other change requires a lot of changes to the DWARF parser and logic. So I actually like this feature.
> 
> I agree it is a fine quick&dirty hack. Just if my DWZ support gets accepted later anyway then this `.debug_types` feature could be implemented by its framework in a clean way (as a regular DIEs remapping which is required for DWZ anyway).
> 
> > If two technologies (DWZ + DWO, DWZ + debug_types, etc) are used together, we need to ensure they can.
> 
> `DWZ + DWO` do not make sense to me.  I haven't tried to use DWZ for DWO but DWZ finds common DWARF subtrees typically across CUs so it would not find much.
> 
> `DWZ + debug_types` is explicitly supported by the DWZ tool although that is IMO for compatibility only, DWZ can make the common type references slightly smaller than debug_types. I will sure need to implement debug_types support into my DWZ-for-LLDB patchset later.
> 
> For DWARF in .o files on Darwin, we encode the compile unit index into the top 32 bits and the DIE offset into the lower

BTW that prevents implementing DWARF64 so I wanted to prevent using the 32/32 split. Currently Fedora/RHEL already contains files with `.debug_info` size near the 32-bit limit:
```http://ftp.muni.cz/pub/linux/fedora/linux/development/rawhide/Everything/x86_64/debug/tree/Packages/q/qt5-qtwebkit-debuginfo-5.212.0-0.20.alpha2.fc29.x86_64.rpm
/usr/lib/debug/usr/lib64/libQt5WebKit.so.5.212.0-5.212.0-0.20.alpha2.fc29.x86_64.debug
.debug_info size = 0x9bff1b08 = 2.4GiB
```
Although that is because neither DWZ (as it would run out of memory) nor `.debug_types` (as it is expected DWZ will handle the duplicities) are applied and so all CUs are separate and even if it was >4GB there would still be no need for 64-bit `DW_FORM_ref_addr`. Still we are already close to the 32-bit limit, also shifting the split for example to 40/24 would make some limit on number of CUs etc.

https://reviews.llvm.org/D32167