[PATCH] D45170: Cleanup DWARFCompileUnit and DWARFUnit in preparation for adding DWARFTypeUnit

Sat Apr 7 02:04:35 PDT 2018

On Sat, 07 Apr 2018 00:52:26 +0200, Greg Clayton wrote:
> Take look at how LLVM does it. I believe my changes mirror that approach.

LLVM does not support partial units so there is nothing to look at there.

> DWARFUnit should be the base class for anything that needs to hand out DIEs.

That's OK for partial units.

> Any specializations should be inheriting from DWARFUnit, like both
> DWARFCompileUnit and DWARFTypeUnit.

I see now the source of misunderstanding:

DWARFCompileUnit = DW_TAG_compile_unit
DWARFTypeUnit = DW_TAG_type_unit
DWARFPartialUnit != DW_TAG_partial_unit
                ^^^^
BTW DW_TAG_imported_unit is importing (="caller") tag, not a unit (="callee").

DW_TAG_partial_unit gets read in (by
DWARFDebugInfo::ParseCompileUnitHeadersIfNeeded) as DWARFCompileUnit because
there is no quick enough way to find the difference.  It would require reading
the first DIE tag which means to read and decode .debug_abbrev for each unit
being scanned.

DWARFPartialUnit is used only as a remapping of DWARFCompileUnit to a new
offset without any new data (there is stored only a new remapped offset whose
value is only made up internally in LLDB) at the moment someone uses
DW_TAG_imported_unit for it - at that moment we easily know that unit has to
be DW_TAG_partial_unit.

Therefore DWARFCompileUnit and DWARFTypeUnit both contain some their own data.
But DWARFPartialUnit is just a remapping of DWARFCompileUnit (containing
DW_TAG_partial_unit) to a new offset without any new data. Particularly
m_die_array is not in DWARFPartialUnit.

DWARFTypeUnit can be recognized easily as it is either in .debug_types
(<=DWARF-4) or the unit header contains DW_UT_type (>=DWARF-5).
DWARFPartialUnit (for DW_TAG_partial_unit) cannot be recognized easily first.
Besides that one would need then some DWARFRemappedPartialUnit for what I use
DWARFPartialUnit now.

I have implemented it according to your advice from this mail - at least
according to how I understood it:
	[lldb-dev] RFC for DWZ = DW_TAG_imported_unit + DWARF-5 supplementary files
	https://lists.llvm.org/pipermail/lldb-dev/2017-August/012664.html

It does not try to share anything at AST level, it only shares DWARF data (and
thus DWARFCompileUnit). Given that DWZ finds arbitrary unique DWARF subtrees
I find more logical to decode it at DWARF (and not at AST) level.

You wrote:
# The drawback is this won't allow sharing /tmp/shared1 or /tmp/shared2
# between two different top level DWARF files, but it does allow one
# clang::ASTContext to be used between all of them.

In my implementation /tmp/shared1
(/usr/lib/debug/.dwz/coreutils-8.27-20.fc27.x86_64) is shared between multiple
*.so files (which use DW_TAG_imported_unit) at DWARF level, also
clang::ASTContext is shared.

# SymbolFileDWARFDebugMap makes it lldb::user_id_t contain the CU index in the
# top 32 bits, and the DIE offset within that .o file's DWARF in the bottom 32
# bits. You could do something similar in your case where the top 32 bits is
# the index of the DWARF file in the "dwarf[]" array that would be maintained
# in a new SymbolFileDWARFDWZ subclass.

DW_TAG_imported_unit+DW_TAG_partial_unit can be also used for optimization of
a single file (without /usr/lib/debug/.dwz/* file which is used exclusively
for DW_TAG_partial_unit entries). Additionally the tags can be also used for
recursive inclusion. I haven't found how to use the top 32 bits for that.
I just reserve new remapped offset space for each DW_TAG_imported_unit (in the
bottom 32 bits).

I tried first to implement dw_offset_t caller (=unit with
DW_TAG_imported_unit) to be tracked along any dw_offset_t DIE offset but that
would require huge changes of the DWARF parsing code everywhere.  Also it
cannot work well given the inclusion is recursive (so we would need
std::vector<dw_offset_t> of the callers stack).

> I have no idea what are in your other patches

OK, so there was a gross misunderstanding and my DWARFUnit implemented
something very different from what you expected+approved. I am sure fine with
that but I needed to understand first why you did that big screw up of my
carefully coded patch.

> > This is how DWARFPartialUnit works, it is only a DWARFCompileUnit remapped to
> > new offset.  I do not see how to implement it transparently without the
> > accessor (and without needlessly copying all the data fields many times into
> > each DWARFPartialUnit instance).
> 
> What extra functions are needed for a partial unit that can't be done in
> a DWARFCompileUnit? Seems like they both contain things, but the partial
> unit can be referenced from compile units. 

DWARFPartialUnit is only a remapping, not really a representation of DWARF
file data.  So DWARFPartialUnit cannot contain its own m_die_array, m_version
and other data members you have moved back to DWARFUnit.

> As we are saying, we are trying to make the layering more like LLVM's
> layering, so that is what I meant by "fix the layering". I believe we should
> strive for being more like LLVM so that any transition can happen without
> major re-organization of the DWARF code later. So I would like the get the
> ok the revert the revert if you on board with what my suggestions are in the
> paragraph. I know this will require modifications to your patches and
> apologize for that.
> 
> Let me know what you think,

I would like to know what is the approved plan / upstreaming order regarding
all the planned changes:

(1) Your DWARFTypeUnit patch - it makes it more aligned to LLVM DWARFUnits.
(2) My DWZ patch - it is a new feature with no counterpart in LLVM DWARFUnits.
(3) Replacement of LLDB DWARFUnits with LLVM DWARFUnits.

I can sure work even on (3) but after half a year of work on DWZ support which
completely blocks LLDB for Red Hat usage (as Red Hat requires "upstream first"
to prevent heavy forks like what happened for Red Hat GDB) it makes the DWZ
upstreaming possibility too far for me to start refactoring LLDB for (3) first
- before upstreaming (2).

Thanks,
Jan