[Lldb-commits] [PATCH] D96236: [lldb] DWZ 1/9: Pass main DWARFUnit * along DWARFDIEs

Jan Kratochvil via Phabricator via lldb-commits lldb-commits at lists.llvm.org
Fri Sep 24 02:52:05 PDT 2021


jankratochvil added a comment.

Asking LLDB community whether to continue with this patchset upstreaming:
Its advantage is sure compatibility with DWZ being used by {RHEL,CentOS}-{7,8}. The next version of {RHEL,CentOS} will use it as well. By my quick check Debian 12=Bookworm=testing is not using DWZ but Debian Sid=unstable is using DWZ. DWZ is applied for debug info packages supplied by system vendors. DWZ is not automatically applied for binaries compiled by the user.
The disadvantage is that it complicates LLDB codebase IMO a lot (I haven't found a less intrusive way to implement the DWZ compatibility) and the debug info size reduction by DWZ is IMO too small to be worth all the LLDB codebase complications. There could be for example much easier DWZ decompressor (it does not exist now) for compatibility reasons instead. For current and future the DWZ optimization has been IMO superseded by `clang -flto`. (I have no numbers for that claim).
My employer wants this patchset for DWZ support to be upstream in LLDB. Personally I am against this idea for reasons in the paragraph above, that complicating LLDB codebase is not worth only backward compatibility reasons. IMO nowadays DWZ has been superseded by LTO, size advantages of separate `*.debug` files download and less importance of small file size difference compared to software engineering simplicity.
I spent 2 months measuring effects of DWZ on Fedora/CentOS distribution size (the benchmarking code <https://git.jankratochvil.net/?p=massrebuild.git;a=tree>): For `*-debuginfo.rpm` size storage compared to `-fdebug-types-section` I did measure in average 5% size reduction in favor of DWZ but with stddev of +/-11% of the size. That means the size reduction strongly depends on which packages are chosen. For example for package subset of Fedora which is in CentOS the size reduction is only 0.28%. Another example is Fedora subset of packages I have installed where DWZ `*-debuginfo.rpm` is even 0.72% bigger than `-fdebug-types-section`.
DWZ 5% saving saves about 5GB of the 82GB debug info size per Fedora Linux distribution. Personally I find it all pointless as it is one 4K movie size and nobody is downloading all the distribution debug info anyway.
Nowadays with `-flto` I believe the size is even smaller than `-fdebug-types-section` (and therefore the DWZ size advantage is worse) but I have no numbers for this claim.
The average 5% size reduction on `*-debuginfo.rpm` is primarily thanks to DWZ DWARF-5 "7.3.6 DWARF Supplementary Object Files". While that is useful for `*-debuginfo.rpm` size with the move from downloading whole `*-debuginfo.rpm` to rather download separate `*.debug` files (for example by debuginfod <https://sourceware.org/elfutils/Debuginfod.html>) one then has to download more data with DWZ rather than less (it depends how many files from the same package one downloads etc.).
The problem of the current DWZ file format is that one cannot parse `DW_UT_partial` without having its `DW_UT_compile` (containing `DW_TAG_imported_unit` for that `DW_UT_partial`) as `DW_UT_partial` is missing `DW_AT_language`. Another problem is that "DWARF Supplementary Object Files" can contain not just the typical types (like `DW_UT_type` although for DWZ in `DW_UT_partial`) but it can contain also `static const int variable=42;' which does not need `DW_AT_location`. According to @clayborg LLVM IR for types could be imported across files but not the variables (for variables we also need to know the parent `DW_UT_compile` when parsing them). This all makes the need to carry `DWARFUnit *main_unit` (usually as a part of `DWARFUnitPair`) everywhere across the LLDB codebase.
There could be a new file format not using `DW_UT_partial` but with "DWARF Supplementary Object Files" containing only `DW_UT_type` units which would have with LTO IMO the same `*-debuginfo.rpm` size benefits as DWZ without the difficulty to carry `DWARFUnit *main_unit` everywhere. But then such simplified LLDB reader would no longer be compatible with existing Red Hat OSes debug info formats which Red Hat is therefore not interested in.
There are many more effective debug info size reductions already supported by upstream LLDB. For example `SHF_COMPRESS` (`zlib`) saves 52% (compared to 5% of DWZ). One could also use `zstd` for faster decompression. That is for installed on-disk size (other measurements here are for `*-debuginfo.rpm` size).
If the debug info size matters LLVM could use more optimal `DW_FORM_ref*` than just its current `DW_FORM_ref4`. This is one of the optimizations done by DWZ.
I have removed all memory storage of `DWARFDIE` so that non-DWZ systems do not have performance affected by the DWZ compatibility of LLDB due to the increased size of `DWARFDIE` 16 bytes -> 24 bytes due to the new `DWARFDIE::m_cu::m_main_cu`. Still there remains a question how such `llvm::DWARFDie` size increase would be accepted by LLVM if the DWARF merge LLDB->LLVM ever happens.
Personally I believe it would be more convenient to solve the compatibility with DWZ debug info by an external "DWZ decompressor" tool which would transparently decompress the DWZ files to some cache directory. There could be a hook for such an external decompressing tool upstreamed to LLDB.
Current DWZ-optimization tool <https://sourceware.org/dwz/> has these disadvantages:

- DWZ does not support `-fdebug-types-section` - for DWARF-5 it errors on `DW_UT_type`. That means one needs to build big (approx. twice as big) intermediate files (before one can run DWZ) which run out of memory and disk space on build farms when building large packages (such as LLVM).
- DWZ will give up when it runs out of memory (`--dwz-low-mem-die-limit`, `--dwz-max-die-limit`) which happens for larger packages on build farms. In such case the debug info is extra large as one could not use even `-fdebug-types-section` for compatibility with DWZ in the first place. This is IMO why there is so big DWZ stddev +/-11% on the package sizes.

This patchset does not yet implement DWZ optimization of `.debug_macro`. DWARF-5 standard has currently no solution for file format of `.debug_names` for DWZ-optimized files (`.debug_names` becomes misleading/invalid after DWZ).
As I find DWZ technology superseded by LTO together with separate `*.debug` downloads and I failed to negotiate solving the DWZ compatibility of Red Hat OSes downstream (by keeping this LLDB DWZ patchset only downstream or writing a separate transparent DWZ decompressor instead) I have decided to protect LLDB codebase by quitting Red Hat. Unfortunately Red Hat wants me to upstream this patchset still during my leave notice period until October 29th 2021. Apparently I will not support this patchset starting with October 30th 2021. Red Hat currently does not have any other LLDB engineer replacement for me. The patchset is copyright Red Hat company. I would like not to get connected my name with this patchset if it gets upstreamed so one should use `git commit --author` to the Red Hat company.
This whole patchset on Github. <https://github.com/jankratochvil/llvm-project/tree/dwz>


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96236/new/

https://reviews.llvm.org/D96236



More information about the lldb-commits mailing list