[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

Wed Jun 3 14:31:20 PDT 2020

On Wed, Jun 3, 2020 at 6:34 AM Robinson, Paul <paul.robinson at sony.com> wrote:
>
> DWARF was designed in an era when COMDAT and ICF were not a thing, or at least not common, certainly not when talking about function code.  The overhead of a unit occurred only once per translation unit, so that expense was reasonably amortized.
>
>
>
> Splitting functions into their own object-file sections and making them excludable is an evolution of compiler/linker technology that DWARF has not kept up with.  The linker-friendly solutions (COMDAT DWARF) would put function-related .debug_* contributions into a section-group along with the function .text itself; this multiplies the total number of sections to deal with, regardless of the tactics used for the content of each per-function DWARF section.  The fully DWARF-conformant solution would create one partial_unit per function, with the corresponding overhead of unit headers (especially painful in the .debug_line section).  Alternatively we fragment DWARF into sections without headers and rely on the linker to make everything look right in the linked executable; this produces .o files that are not DWARF conformant (unless we can standardize this in DWARF v6) and would be a big hassle for consumers other than the linker.

"object files don't contain DWARF, but they contain stuff that the
linker will turn into DWARF" wouldn't seem like the worst thing to me
- what sort of pre-linking parsing of DWARF use cases do you have in
mind, other than for our own compiler development uses?
(notwithstanding in-object Split DWARF (where the .dwo sections would
have to be remain usable without linking) or the MachO style debug
info distribution model which is similar)

But even then, I'm not sure how viable it would be - as Fangrui
pointed out on another thread about this: ELF section overhead itself
is non-trivial ("sizeof(Elf64_Shdr) = 64.") & it would probably be
rather difficult to reconstruct header-less slice-and-dicable sections
in some cases. For type information (a reduced overhead version of
-fdebug-types-section) I could see it - but for functions, they need
to refer to addresses - preferably in the debug_addr section, and
that's accessed by index, so taking chunks out of it would break other
references to it, etc... adding the header would be expensive, and how
would the CU construct its DW_AT_ranges value if that has to be sliced
and diced? Again, some amount of linker magic might solve some of
these problems - but I think there's still a lot of overhead to making
a solution that's workable with a DWARF-agnostic linker (or even with
a DWARF aware one, but in an efficient amount of time/space where it's
not only usable for small programs, or for linking when you're
shipping a final production binary, etc)

& as always, not sure how any of this would work for Split DWARF -
just a debug_adr section that has some addresses that point to
discardable functions... if we want those addresses themselves to be
discardable (so we don't have to use a tombstone value inserted by the
linker) then they'd need to be in separate debug_addr contributions
with headers, etc - the overhead just seems too high to me in all the
ways I can look at that.

>
>
>
> Or we pay the cost of parsing, trimming, and rewriting all the DWARF in the linker.
>
> --paulr
>
>
>
> From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of James Henderson via llvm-dev
> Sent: Wednesday, June 3, 2020 3:48 AM
> To: David Blaikie <dblaikie at gmail.com>
> Cc: llvm-dev at lists.llvm.org
> Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.
>
>
>
> It makes me sad that the linker (via a library or otherwise) has to be "DWARF-aware" to be able to effectively handle --gc-sections, COMDATs, --icf etc for debug info, without leaving large blocks of data kicking around.
>
>
>
> The patching to -1 (or equivalent) is probably a good lightweight solution (though I'd love it if it could be done based on section type in the future rather than section name, but that's probably outside the realm of DWARF), as it requires only minimal understanding in the linker, but anything beyond that seems to be complicated logic that is mostly due to the structure of DWARF. Patching to -1 does feel a bit like a sticking plaster/band aid to patch over the issue rather than properly solving it too - there will still be debug data (potentially significant amounts in COMDAT-heavy objects) that the linker has to write and the debugger has to somehow know how to skip (even if it knows that -1 is special-case due to the standard being updated, it needs to get as far as the -1), which is all wasted effort.
>
>
>
> We've already seen from Alexey's prototyping, and from our own experiences with the Sony proprietary linker (which tried to rewrite .debug_line only) that deconstructing the DWARF so that it can be more optimally reassembled at link time is slow going, and will probably inevitably be however much effort is put into optimising it. For a start, given the current standards, it's impossible to know how to deconstruct it without having to parse vast amounts of DWARF, which is typically going to mean a lot more parsing work than the linker would normally have to deal with. Additionally, much of this parsing work is wasted effort, since it seems unlikely in many links that large amounts of the DWARF will be redundant. Having an option to opt-in doesn't help much there, since it just means the logic exists without most people using it, due to it not being good enough, or potentially they don't even know it exists.
>
>
>
> I don't have particularly concrete suggestions as to how to solve the structural problems with DWARF at this point. The only thing that seems obvious to me is a more "blessed" approach to fragmentation of sections, similar to what I tried with my prototype mentioned earlier in the thread, although we'd need to figure out the previously stated performance issues. Other ideas might tie into this, like somehow sharing the various table headers a bit like CIEs in .eh_frame that could be merged by the linker - each object could have separate table header sections, which are referenced by the individual .debug_* blocks, which in turn are one per function/data piece and easily discardable/merged by the linker.
>
>
>
> Just some thoughts.
>
>
>
> James
>
>
>
> On Tue, 2 Jun 2020 at 19:24, David Blaikie via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
> On Tue, May 19, 2020 at 7:17 AM Alexey Lapshin
> <alapshin at accesssoftek.com> wrote:
> >
> > Hi David, please find my comments inside:
> >
> >
> > >>>Broad question: Do you have any specific motivation/users/etc in implementing this (if you can speak about it)?
> >
> > >>> - it might help motivate the work, understand what tradeoffs might be suitable for you/your users, etc.
> >
> > >>There are two general requirements:
> > >> 1) Remove (or clean) invalid debug info.
> >
> > >
> > >Perhaps a simpler direct solution for your immediate needs might be a much narrower,
> > >and more efficient linker-DWARF-awareness feature:
> > >
> > > With DWARFv5, rnglists present an opportunity for a DWARF linker to rewrite the ranges
> > > without parsing the rest of the DWARF. /technically/ this isn't guaranteed - rnglist entries
> > > can be referenced either directly, or by index. If all rnglists are referenced by index, then
> > > a linker could parse only the debug_rnglists section and rewrite ranges to remove any
> > > address ranges that refer to optimized-out code.
> > >
> > > This would only be correct for rnglists that had no direct references to them (that only were
> > > referenced via the indexes) - but we could either implement it with that assumption, or could
> > > add an LLVM extension attribute on the CU that would say "I promise I only referenced rnglists
> > > via rnglistx forms/indexes). If this DWARF-aware linking would have to read the CU DIE (not
> > > all the other DIEs) it /could/ also then rewrite high/low_pc if the CU wasn't using ranges...
> > > but that wouldn't come up in the function-removal case, because then you'd have ranges anyway,
> > > so no need for that.
> > >
> > > Such a DWARF-aware rnglist linking could also simplify rnglists, in cases where functions
> > > ended up being laid out next to each other, the linker could coalesce their ranges together.
> > >
> > > I imagine this could be implemented with very little overhead to linking, especially compared
> > > to the overhead of full DWARF-aware linking.
> > >
> > >Though none of this fixes Split DWARF, where the linker doesn't get a chance to see the
> > > addresses being used - but if you only want/need the CU-level ranges to be correct, this
> > > might be a viable fix, and quite efficient.
> >
> > Yes, we think about that alternative. This would resolve our problem of invalid debug info
> > and would work much faster. Thus, if we would not have good results for D74169 then we
> > will implement it. Do you think it could be useful to have this solution in upstream?
>
> A pure rnglist rewriting - I think it'd be OK to have in upstream -
> again, cost/benefit/etc would have to be weighed. I'm not sure it
> would save enough space to be particularly valuable beyond the
> correctness issue - and it doesn't completely solve the correctness
> issue for zero-address usage or low-address usage (because you could
> still have overlapping subprograms inside a CU - so if you were
> symbolizing you could use the correct rnglist to filter, but then go
> look inside the CU only to find two subprograms that had that address
> & not know which one was the correct one an which one was the
> discarded one).
>
> rnglist rewriting might be easy enough to prototype - but depends what
> you want to spend your time on, I know this whole issue has been a
> huge investment of your time already - but maybe this recent
> revitalization of the conversation around having an explicit value in
> the linker might be sufficient to address everyone's needs... *fingers
> crossed*)
>
>
> > >> 2) Optimize the DWARF size.
> >
> >
> > > Do your users care much about this? I imagine if they had significant DWARF size issues,
> > > they'd have significant link time issues and the kind of cost to link time this feature has would
> > > be prohibitive - but perhaps they're sharing linked binaries much more often than they're
> > > actually performing linking.
> >
> > Yes, they do. They also have significant link-time issues.
> > So current performance results of D74169 are not very acceptable.
> > We hope to improve it.
> >
> >
> >
> > >>The specifics which our users have:
> > >>  - embedded platform which uses 0 as start of .text section.
> > >>  - custom toolset which does not support all features yet(f.e. split dwarf).
> > >>  - tolerant of the link-time increase.
> > >>  - need a useful way to share debug builds.
> >
> >
> > > Sharing two files (executable and dwp) is significantly less useful than sharing one file?
> >
> > Probably not significantly, but yes, it looks less useful comparing to D74169.
> > Having only two files (executable and .dwp) looks significantly better than having executable and multiple .dwo files.
> > Having only one file(executable) with minimal size looks better than the two files with a bigger size.
> >
> > clang compiled with -gsplitdwarf takes 0.9G for executable and 0.9G for .dwp.
> > clang compiled with -gc-debuginfo takes only 0.76G for single executable.
> >
> >
> >
> > >>For the first point: we have a problem "Overlapping address ranges starting from 0"(D59553).
> >
> > >>We use custom solution, but the general solution like D74169 would be better here.
> >
> >
> > > If CU ranges are the only ones that need fixing, then I think the above solution might be as
> > > good/better - if more than CU ranges need fixing, then I think we might want to start talking about
> > > how to fix DWARF itself (split and non-split) to signal certain addresses point to dead code with a
> > > specific blessed value that linkers would need to implement - because with Split DWARF there's
> > > no way to solve the non-CU addresses at the linker.
> >
> > I think the worthful solution for that signal value would be LowPC > HighPC.
> > That does not require additional bits in DWARF.
> > It would be natural to skip such address ranges since they explicitly marked as invalid.
> > It could be implemented in a linker very easily. Probably, it would make sense to describe that
> > usage in DWARF standard.
> >
> > As to the addresses which are not seen by the linker(since they are in .dwo files) - yes,
> > they need to have another solution. Could you show an example of such a case, please?
> >
> >
> >
> > >>>2. Support of type units.
> >
> > >>>
> >
> > >>>>  That could be implemented further.
> >
> > >>>Enabling type units increases object size to make it easier to deduplicate at link time by a DWARF-unaware
> >
> > >>>linker. With a DWARF aware linker it'd be generally desirable not to have to add that object size overhead to
> >
> > >>>get the linking improvements.
> >
> > >>
> >
> > >>But, DWARFLinker should adequately work with type units since they are already implemented.
> >
> >
> > > Maybe - it'd be nice & all, but I don't think it's an outright necessity - if someone knows they're using
> > > a DWARF-aware linker, they'd probably not use type units in their object files. It's possible someone
> > > doesn't know for sure & maybe they have pre-canned debug object files from someone else, etc.
> >
> > I see.
> >
> > >>Another thing is that the idea behind type units has the potential to help Dwarf-aware linker to work faster.
> >
> > >>Currently, DWARFLinker analyzes context to understand whether types are the same or not.
> >
> >
> > >When you say "analyzes context" what do you mean? Usually I'd take that to mean
> > > "looks at things outside the type itself - like what namespace it's in, etc" - which, yes,
> > > it should do that, but it doesn't seem very expensive to do. But I guess you actually
> > > mean something about doing structural equivalence in some way, looking at things inside the type?
> >
> > I think it could be useful for both cases. Currently, dsymutil does only first thing
> > (look at type name, namespace name, etc..) and does not do the second thing
> > (doing structural equivalence). Analyzing type names is currently quite expensive
> > (the only search in string pool takes ~10 sec from 70 sec of overall time).
> > That is expensive because of many things should be done to work with strings:
> > parse DWARF, search and resolve relocations, compute a hash for strings,
> > put data into a string pool, create a fully qualified name(like namespace::function::name).
> > It looks like it could be optimized and finally require less time, but it still would be a noticeable
> > part of the overall time.
> >
> > If dsymutil starts to check for the structural equivalence, then the process would be even more slowly.
> > So, If instead of comparing types structure, there would be checked single hash-id - then this process
> > would also be faster.
> >
> > Thus I think using hash-id to compare types would allow to make current implementation faster and would
> > allow handling incomplete types by DWARFLinker without massive performance degradation also.
> >
> > >> But the context is known when types are generated. So, no need to spent the time analyzing it.
> >
> > >> If types could be compared without analyzing context, then Dwarf-aware linker would work faster.
> >
> > >> That is just an idea(not for immediate implementation): If types would be stored in some "type table"
> >
> > >> (instead of COMDAT section group) and could be accessed through hash-id(like type units
> >
> > >> - then it would be the solution requiring fewer bits to store but allowing to compare types
> >
> > >> by hash-id(not analysing context).
> > >> In this case, size increasing would be small. And processing time could be done faster.
> > >>
> > >> this is just an idea and could be discussed separately from the problem of integrating of D74169.
> >
> > >> >> 6. -flto=thin
> >
> > >> >>    That problem was described in this review https://reviews.llvm.org/D54747#1503720. It also exists in
> >
> > >> >> current DWARFLinker/dsymutil implementation. I think that problem should be discussed more: it could
> >
> > >> >> probably be fixed by avoiding generation of such incomplete declaration during thinlto,
> >
> > >> >> That would be costly to produce extra/redundant debug info in ThinLTO - actually ThinLTO could be doing
> >
> > >> >> more to reduce that redundancy early on (actually removing definitions from some llvm Modules if the type
> >
> > >> >> definition is known to exist in another Module, etc)
> > >> >I don't know if it's a problem since that patch was reverted.
> >
> > >>
> >
> > >> Yes. That patch was reverted, but this patch(D74169) has the same problem.
> >
> > >> if D74169 would be applied and --gc-debuginfo used then structure type
> > >> definition would be removed.
> >
> > >> DWARFLinker could handle that case - "removing definitions from some llvm Modules if the type
> > >> definition is known to exist in another Module".
> > >> i.e. DWARFLinker could replace the declaration with the definition.
> >
> > >> But that problem could be more easily resolved when debug info is generated(probably without
> > >> significant increase of debug info size):
> >
> > >> Here we have:
> >
> > >> DW_TAG_compile_unit(0x0000000b) - compile unit containing concrete instance for function "f".
> > >> DW_TAG_compile_unit(0x00000073) - compile unit containing abstract instance root for function "f".
> > >> DW_TAG_compile_unit(0x000000c1) - compile unit containing function "f" definition.
> >
> > >> Code for function "f" was deleted. gc-debuginfo deletes compile unit DW_TAG_compile_unit(0x000000c1)
> > >> containing "f" definition (since there is no corresponding code). But it has structure "Foo" definition
> > >> DW_TAG_structure_type(0x0000011e) referenced from DW_TAG_compile_unit(0x00000073)
> > >> by declaration DW_TAG_structure_type(0x000000ae). That declaration is exactly the case when definition
> > >> was removed by thinlto and replaced with declaration.
> >
> > >> Would it cost too much if type definition would not be replaced with declaration for "abstract instance root"?
> > >> The number of concrete instances is bigger than number of abstract instance roots.
> > >> Probably, it would not be too costly to leave definition in abstract instance root?
> >
> >
> >
> > >> Alternatively, Would it cost too much if type definition would not be replaced with declaration when
> > >> declaration references type from not used function? (lto could understand that concrete function is not used).
> >
> >
> > >I don't follow this example - could you provide a small concrete test case I could reproduce?
> >
> > I would provide a test case if necessary. But it looks like this issue is finally clear, and you already commented on that.
> >
> > > Oh, I guess this is happening perhaps because ThinLTO can't know for sure that a standalone
> > > definition of 'f' won't be needed - so it produces one in case one of the inlining opportunities
> > > doesn't end up inlining. Then it turns out all calls got inlined, so the external definition wasn't needed.
> >
> > > Oh, you're suggesting that these 3 CUs got emitted into one object file during LTO, but that DWARFLinker
> > > drops a CU without any code in it - even though... So far as I know, in LTO, LLVM directly references
> > > types across units if the CUs are all emitted in the same object file. (and if they weren't in the same
> > > object file - then the abstract_origin couldn't be pointing cross-CU).
> >
> > > I guess some basic things to say:
> >
> > > With ThinLTO, the concrete/standalone function definition is emitted in case some call sites don't end up
> > > being inlined. So we know it'll be emitted (but might not be needed by the actual linker)
> > > ANy number of inline calls might exist - but we shouldn't put the type information into those, because
> > > they aren't guaranteed to emit it (if the inline function gets optimized away, there would be nothing to
> > > enforce the type being emitted) - and even if we forced the type information to be emitted into one
> > > object file that has an inline copy of the function - there's no guarantee that object file will get linked in either.
> >
> > > So, no, I don't think there's much we can do to keep the size of object files down, while guaranteeing
> > > the type information will be emitted with the usual linker semantics.
> >
> > Then dsymutil/DWARFLinker could be changed to handle that(though it would probably be not very efficient).
> > If thinlto would understand that function is not used finally(and then must not contain referenced type definition),
> > then this situation could be handled more effectively.
> >
> > Thank you, Alexey.
> >
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> LLVM Developers mailing list
> >>> llvm-dev at lists.llvm.org
> >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev