[Lldb-commits] [PATCH] D45170: Cleanup DWARFCompileUnit and DWARFUnit in preparation for adding DWARFTypeUnit

Tue Apr 10 08:46:55 PDT 2018

> On Apr 7, 2018, at 2:04 AM, Jan Kratochvil <jan.kratochvil at redhat.com> wrote:
> 
> On Sat, 07 Apr 2018 00:52:26 +0200, Greg Clayton wrote:
>> Take look at how LLVM does it. I believe my changes mirror that approach.
> 
> LLVM does not support partial units so there is nothing to look at there.
> 
> 
>> DWARFUnit should be the base class for anything that needs to hand out DIEs.
> 
> That's OK for partial units.
> 
> 
>> Any specializations should be inheriting from DWARFUnit, like both
>> DWARFCompileUnit and DWARFTypeUnit.
> 
> I see now the source of misunderstanding:
> 
> DWARFCompileUnit = DW_TAG_compile_unit
> DWARFTypeUnit = DW_TAG_type_unit
> DWARFPartialUnit != DW_TAG_partial_unit
>                ^^^^
> BTW DW_TAG_imported_unit is importing (="caller") tag, not a unit (="callee").
> 
> DW_TAG_partial_unit gets read in (by
> DWARFDebugInfo::ParseCompileUnitHeadersIfNeeded) as DWARFCompileUnit because
> there is no quick enough way to find the difference.  It would require reading
> the first DIE tag which means to read and decode .debug_abbrev for each unit
> being scanned.

If there is no structural difference other than the first DW_TAG, is there a reason you would need to know the difference? It would be fine to just add extra methods on DWARFCompileUnit that do partial unit kind of things if it is a partial unit? Let me know if I am missing something?

> 
> DWARFPartialUnit is used only as a remapping of DWARFCompileUnit to a new
> offset without any new data (there is stored only a new remapped offset whose
> value is only made up internally in LLDB) at the moment someone uses
> DW_TAG_imported_unit for it - at that moment we easily know that unit has to
> be DW_TAG_partial_unit.

It is remapped to a new offset just to keep LLDB's lldb::user_id_t stuff happy? A partial unit can refer to an external file on disk. The remapping is solely making a unique offset by adding an offset to the offset of the external file?

> 
> Therefore DWARFCompileUnit and DWARFTypeUnit both contain some their own data.
> But DWARFPartialUnit is just a remapping of DWARFCompileUnit (containing
> DW_TAG_partial_unit) to a new offset without any new data. Particularly
> m_die_array is not in DWARFPartialUnit.

In reading the DWZ it sounded like you can only have 1 external debug info file and that external debug info file can't itself refer to another? Is this what the DWARFPartialUnit would be for and this is the only remapping that needs to happen?

> 
> DWARFTypeUnit can be recognized easily as it is either in .debug_types
> (<=DWARF-4) or the unit header contains DW_UT_type (>=DWARF-5).
> DWARFPartialUnit (for DW_TAG_partial_unit) cannot be recognized easily first.
> Besides that one would need then some DWARFRemappedPartialUnit for what I use
> DWARFPartialUnit now.
> 
> I have implemented it according to your advice from this mail - at least
> according to how I understood it:
> 	[lldb-dev] RFC for DWZ = DW_TAG_imported_unit + DWARF-5 supplementary files
> 	https://lists.llvm.org/pipermail/lldb-dev/2017-August/012664.html
> 
> It does not try to share anything at AST level, it only shares DWARF data (and
> thus DWARFCompileUnit). Given that DWZ finds arbitrary unique DWARF subtrees
> I find more logical to decode it at DWARF (and not at AST) level.
> 
> You wrote:
> # The drawback is this won't allow sharing /tmp/shared1 or /tmp/shared2
> # between two different top level DWARF files, but it does allow one
> # clang::ASTContext to be used between all of them.
> 
> In my implementation /tmp/shared1
> (/usr/lib/debug/.dwz/coreutils-8.27-20.fc27.x86_64) is shared between multiple
> *.so files (which use DW_TAG_imported_unit) at DWARF level, also
> clang::ASTContext is shared.
> 
> 
> # SymbolFileDWARFDebugMap makes it lldb::user_id_t contain the CU index in the
> # top 32 bits, and the DIE offset within that .o file's DWARF in the bottom 32
> # bits. You could do something similar in your case where the top 32 bits is
> # the index of the DWARF file in the "dwarf[]" array that would be maintained
> # in a new SymbolFileDWARFDWZ subclass.
> 
> DW_TAG_imported_unit+DW_TAG_partial_unit can be also used for optimization of
> a single file (without /usr/lib/debug/.dwz/* file which is used exclusively
> for DW_TAG_partial_unit entries). Additionally the tags can be also used for
> recursive inclusion. I haven't found how to use the top 32 bits for that.
> I just reserve new remapped offset space for each DW_TAG_imported_unit (in the
> bottom 32 bits).
> 
> I tried first to implement dw_offset_t caller (=unit with
> DW_TAG_imported_unit) to be tracked along any dw_offset_t DIE offset but that
> would require huge changes of the DWARF parsing code everywhere.  Also it
> cannot work well given the inclusion is recursive (so we would need
> std::vector<dw_offset_t> of the callers stack).
> 
> 
> 
>> I have no idea what are in your other patches
> 
> OK, so there was a gross misunderstanding and my DWARFUnit implemented
> something very different from what you expected+approved. I am sure fine with
> that but I needed to understand first why you did that big screw up of my
> carefully coded patch.
> 
> 
>>> This is how DWARFPartialUnit works, it is only a DWARFCompileUnit remapped to
>>> new offset.  I do not see how to implement it transparently without the
>>> accessor (and without needlessly copying all the data fields many times into
>>> each DWARFPartialUnit instance).
>> 
>> What extra functions are needed for a partial unit that can't be done in
>> a DWARFCompileUnit? Seems like they both contain things, but the partial
>> unit can be referenced from compile units. 
> 
> DWARFPartialUnit is only a remapping, not really a representation of DWARF
> file data.  So DWARFPartialUnit cannot contain its own m_die_array, m_version
> and other data members you have moved back to DWARFUnit.
> 
> 
>> As we are saying, we are trying to make the layering more like LLVM's
>> layering, so that is what I meant by "fix the layering". I believe we should
>> strive for being more like LLVM so that any transition can happen without
>> major re-organization of the DWARF code later. So I would like the get the
>> ok the revert the revert if you on board with what my suggestions are in the
>> paragraph. I know this will require modifications to your patches and
>> apologize for that.
>> 
>> Let me know what you think,
> 
> I would like to know what is the approved plan / upstreaming order regarding
> all the planned changes:
> 
> (1) Your DWARFTypeUnit patch - it makes it more aligned to LLVM DWARFUnits.
> (2) My DWZ patch - it is a new feature with no counterpart in LLVM DWARFUnits.
> (3) Replacement of LLDB DWARFUnits with LLVM DWARFUnits.
> 
> I can sure work even on (3) but after half a year of work on DWZ support which
> completely blocks LLDB for Red Hat usage (as Red Hat requires "upstream first"
> to prevent heavy forks like what happened for Red Hat GDB) it makes the DWZ
> upstreaming possibility too far for me to start refactoring LLDB for (3) first
> - before upstreaming (2).
> 
> 
> Thanks,
> Jan

We need to solve (1) first so we can move onto (2). (3) can wait IMHO, but if someone really wants to tackle that it is fine.

I think I need to see any example of this DWZ so I can intelligently comment on thing. The brief description of DWZ at http://www.dwarfstd.org/ShowIssue.php?issue=120604.1&type=open <http://www.dwarfstd.org/ShowIssue.php?issue=120604.1&type=open> doesn't help me see how this is actually laid out on disk and who refers to who and how things are laid out in the DWARF itself. Can you somehow make some simple executables available to me so I can see the DWARF in the main compile unit, the partial unit and the info in the external DWZ file? I know you know what you have done and fully understand this feature, but I would like to take a look at an example so I can see how we can make both the .debug_types and your feature work using similar work arounds.

One idea that might work is to expand the DWARFDataExtractor so it can contain enough information to create a lldb::user_id_t that would work for .debug_info, .debug_types and for the alternate debug file. It would also need to be able to get us to the right debug info when a SymbolFileDWARF API was handed a lldb::user_id_t and be able to turn that back into something. The current offset hack that was used for .debug_types could be used for this purpose as well. But I really would like to see an example program that refers to an external DWZ file so I can make sure what I recommend is viable.

Greg Clayton

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-commits/attachments/20180410/43dc3cd5/attachment-0001.html>