[Lldb-commits] [PATCH] D32167: Add support for type units (.debug_types) to LLDB in a way that is compatible with DWARF 5

Mon Apr 9 18:45:19 PDT 2018

clayborg added a comment.

Is there a link to some documentation that explains what DWZ does? I found a bunch of blurbs and man pages, but nothing useful. Nothing is found when I look for "DWZ" in either the DWARF 4 or DWARF 5 spec.

================
Comment at: source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp:595
+      uint64_t debug_info_size = get_debug_info_data().GetByteSize();
+      data_segment.m_data.OffsetData(debug_info_size);
+    }
----------------
jankratochvil wrote:
> I do not like this `DWARFDataExtractor::m_start` modification, it sort of corrupts the `DataExtractor` and various operations stop working then - such as `DWARFDataExtractor::GetByteSize()`. DWZ patch makes from current `dw_offset_t` a virtual (remapped) offset and introduces new physical file section offset which is looked up for data extraction. The file offset is represented as `DWARFFileOffset` in D40474, instead of `bool m_is_dwz;` there could be some `enum { DEBUG_INFO, DEBUG_TYPES, DWZ_DEBUG_INFO } m_where;` instead.
This means that this diff doesn't affect all of the other DWARF code. Nothing in .debug_types will refer to anything else (not DW_FORM_ref_addr, or any external references). So this trick allows us to just treat .debug_info as if .debug_types was appended to the end since nothing in .debug_types refers to any DIE outside of its type unit. This also mirrors what will actually happen with DWARF5 when all of the data is contained inside of the .debug_info section. This allows each DIE to have a unique "ID". Any other change requires a lot of changes to the DWARF parser and logic. So I actually like this feature. We can fix the GetByteSize() if needed. Basically every object in DWARf right now must be able to make a unique 64 bit unsigned integer ID in some way that we can get back to that info and partially parse more. These are handed out as lldb::user_id_t values for types, functions and more. Each flavor of DWARF will encode what they want into here. The normal DWARF it is just the absolute offset within the .debug_info. With .debug_types we just add the size of the .debug_info to the ID. For DWARF in .o files on Darwin, we encode the compile unit index into the top 32 bits and the DIE offset into the lower, DWO does something just as DWZ will need to. DWARFFileOffset doesn't mean much if there are multiple files. We have many competing type uniquing/debug info size reduction strategies being employed here. I can't believe we have DWO, DWZ, and debug types... But we have to make them all work. We can't just use the absolute file offset because DWO used external files where the file offsets could be the same in the external .o files... Not sure how this works with DWZ or what the best option is. I will read up on DWZ so I can propose some viable options. But each new flavor of the day that gets added the DWARF parser is adding a bunch of logic and edge cases. If two technologies (DWZ + DWO, DWZ + debug_types, etc) are used together, we need to ensure they can.

https://reviews.llvm.org/D32167