[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

Fri Jun 5 05:03:42 PDT 2020

small typo in previous letter:

>>FWIW, I think it's probably best to at least initially frame the
>>discussion around non-configurable value for the sake of reducing the
>>scope/possible surface area of the feature/users/etc. I'd probably
>>only encourage adding the user-configurable flag if/when someone has a
>>use case for it.

>I second that: "it's probably best to at least initially frame the
>discussion around non-configurable value for the sake of reducing the
>scope/possible surface area of the feature/users/etc".

>The necessity of using some different concrete value most probably
>would arise if there is a tool which uses this another value.
>Until there is a known use case, it would be better to use just:

>--dead-reloc-addend

--dead-reloc-addend=<value>

>Thank you, Alexey.


On Thu, Jun 4, 2020 at 2:31 PM Fangrui Song <maskray at google.com> wrote:
>
>
> On 2020-06-03, James Henderson via llvm-dev wrote:
> >It makes me sad that the linker (via a library or otherwise) has to be
> >"DWARF-aware" to be able to effectively handle --gc-sections, COMDATs,
> >--icf etc for debug info, without leaving large blocks of data kicking
> >around.
> >
> >The patching to -1 (or equivalent) is probably a good lightweight solution
> >(though I'd love it if it could be done based on section type in the future
> >rather than section name, but that's probably outside the realm of DWARF),
> >as it requires only minimal understanding in the linker, but anything
> >beyond that seems to be complicated logic that is mostly due to the
> >structure of DWARF. Patching to -1 does feel a bit like a sticking
> >plaster/band aid to patch over the issue rather than properly solving it
> >too - there will still be debug data (potentially significant amounts in
> >COMDAT-heavy objects) that the linker has to write and the debugger has to
> >somehow know how to skip (even if it knows that -1 is special-case due to
> >the standard being updated, it needs to get as far as the -1), which is all
> >wasted effort.
> >
> >We've already seen from Alexey's prototyping, and from our own experiences
> >with the Sony proprietary linker (which tried to rewrite .debug_line only)
> >that deconstructing the DWARF so that it can be more optimally reassembled
> >at link time is slow going, and will probably inevitably be however much
> >effort is put into optimising it. For a start, given the current standards,
> >it's impossible to know how to deconstruct it without having to parse vast
> >amounts of DWARF, which is typically going to mean a lot more parsing work
> >than the linker would normally have to deal with. Additionally, much of
> >this parsing work is wasted effort, since it seems unlikely in many links
> >that large amounts of the DWARF will be redundant. Having an option to
> >opt-in doesn't help much there, since it just means the logic exists
> >without most people using it, due to it not being good enough, or
> >potentially they don't even know it exists.
> >
> >I don't have particularly concrete suggestions as to how to solve the
> >structural problems with DWARF at this point. The only thing that seems
> >obvious to me is a more "blessed" approach to fragmentation of sections,
> >similar to what I tried with my prototype mentioned earlier in the thread,
> >although we'd need to figure out the previously stated performance issues.
> >Other ideas might tie into this, like somehow sharing the various table
> >headers a bit like CIEs in .eh_frame that could be merged by the linker -
> >each object could have separate table header sections, which are referenced
> >by the individual .debug_* blocks, which in turn are one per function/data
> >piece and easily discardable/merged by the linker.
> >
> >Just some thoughts.
> >
> >James
>
> Your proposed option --dead-reloc-addend=.debug_info=0xffffffffffffffff
> seems like a good idea.  (I'd expect it to support signed -1 and -2 for
> convenience & consistency in some other places (we sometimes use addends
> as signed values)).
>
> LLD only supports absolute relocation types (plus R_PPC64_DTPREL64 which
> can go to .debug_addr, plus R_RISCV_{ADD,SUB}*).
>
> The computed value is S + A.
> We still consider the symbolic value S as zero, but override A with the
> supplied option --dead-reloc-addend=.debug_info=-1
> I particularly like that `addend` is part of the option name.
>
> My mere complaint is that the relocation record is not dead, but rather
> its referenced symbol is dead. However, I can't think of a better
> name...
>
> Checked with Martin Storsjö, this option may be useful for other binary
> formats supporting DWARF. (binutils does not like ELF-specific options
> not called -z foobar).
> I think it is fine to add this option to LLD if GNU ld is also happy
> with the name.  I'll check with them.
>
> "There is a danger that one community won't accept an extension that
> they haven't been involved in the design process for." :) (Coutesy of Peter)
>
> The built-in rules of the linker are the following:
>
> --dead-reloc-addend=.debug_loc=-2
> --dead-reloc-addend=.debug_ranges=-2
> --dead-reloc-addend=.debug_*=-1
>
> They can be overridden.
>
> >On Tue, 2 Jun 2020 at 19:24, David Blaikie via llvm-dev <
> >llvm-dev at lists.llvm.org> wrote:
> >
> >> On Tue, May 19, 2020 at 7:17 AM Alexey Lapshin
> >> <alapshin at accesssoftek.com> wrote:
> >> >
> >> > Hi David, please find my comments inside:
> >> >
> >> >
> >> > >>>Broad question: Do you have any specific motivation/users/etc in
> >> implementing this (if you can speak about it)?
> >> >
> >> > >>> - it might help motivate the work, understand what tradeoffs might
> >> be suitable for you/your users, etc.
> >> >
> >> > >>There are two general requirements:
> >> > >> 1) Remove (or clean) invalid debug info.
> >> >
> >> > >
> >> > >Perhaps a simpler direct solution for your immediate needs might be a
> >> much narrower,
> >> > >and more efficient linker-DWARF-awareness feature:
> >> > >
> >> > > With DWARFv5, rnglists present an opportunity for a DWARF linker to
> >> rewrite the ranges
> >> > > without parsing the rest of the DWARF. /technically/ this isn't
> >> guaranteed - rnglist entries
> >> > > can be referenced either directly, or by index. If all rnglists are
> >> referenced by index, then
> >> > > a linker could parse only the debug_rnglists section and rewrite
> >> ranges to remove any
> >> > > address ranges that refer to optimized-out code.
> >> > >
> >> > > This would only be correct for rnglists that had no direct references
> >> to them (that only were
> >> > > referenced via the indexes) - but we could either implement it with
> >> that assumption, or could
> >> > > add an LLVM extension attribute on the CU that would say "I promise I
> >> only referenced rnglists
> >> > > via rnglistx forms/indexes). If this DWARF-aware linking would have to
> >> read the CU DIE (not
> >> > > all the other DIEs) it /could/ also then rewrite high/low_pc if the CU
> >> wasn't using ranges...
> >> > > but that wouldn't come up in the function-removal case, because then
> >> you'd have ranges anyway,
> >> > > so no need for that.
> >> > >
> >> > > Such a DWARF-aware rnglist linking could also simplify rnglists, in
> >> cases where functions
> >> > > ended up being laid out next to each other, the linker could coalesce
> >> their ranges together.
> >> > >
> >> > > I imagine this could be implemented with very little overhead to
> >> linking, especially compared
> >> > > to the overhead of full DWARF-aware linking.
> >> > >
> >> > >Though none of this fixes Split DWARF, where the linker doesn't get a
> >> chance to see the
> >> > > addresses being used - but if you only want/need the CU-level ranges
> >> to be correct, this
> >> > > might be a viable fix, and quite efficient.
> >> >
> >> > Yes, we think about that alternative. This would resolve our problem of
> >> invalid debug info
> >> > and would work much faster. Thus, if we would not have good results for
> >> D74169 then we
> >> > will implement it. Do you think it could be useful to have this solution
> >> in upstream?
> >>
> >> A pure rnglist rewriting - I think it'd be OK to have in upstream -
> >> again, cost/benefit/etc would have to be weighed. I'm not sure it
> >> would save enough space to be particularly valuable beyond the
> >> correctness issue - and it doesn't completely solve the correctness
> >> issue for zero-address usage or low-address usage (because you could
> >> still have overlapping subprograms inside a CU - so if you were
> >> symbolizing you could use the correct rnglist to filter, but then go
> >> look inside the CU only to find two subprograms that had that address
> >> & not know which one was the correct one an which one was the
> >> discarded one).
> >>
> >> rnglist rewriting might be easy enough to prototype - but depends what
> >> you want to spend your time on, I know this whole issue has been a
> >> huge investment of your time already - but maybe this recent
> >> revitalization of the conversation around having an explicit value in
> >> the linker might be sufficient to address everyone's needs... *fingers
> >> crossed*)
> >>
> >>
> >> > >> 2) Optimize the DWARF size.
> >> >
> >> >
> >> > > Do your users care much about this? I imagine if they had significant
> >> DWARF size issues,
> >> > > they'd have significant link time issues and the kind of cost to link
> >> time this feature has would
> >> > > be prohibitive - but perhaps they're sharing linked binaries much more
> >> often than they're
> >> > > actually performing linking.
> >> >
> >> > Yes, they do. They also have significant link-time issues.
> >> > So current performance results of D74169 are not very acceptable.
> >> > We hope to improve it.
> >> >
> >> >
> >> >
> >> > >>The specifics which our users have:
> >> > >>  - embedded platform which uses 0 as start of .text section.
> >> > >>  - custom toolset which does not support all features yet(f.e. split
> >> dwarf).
> >> > >>  - tolerant of the link-time increase.
> >> > >>  - need a useful way to share debug builds.
> >> >
> >> >
> >> > > Sharing two files (executable and dwp) is significantly less useful
> >> than sharing one file?
> >> >
> >> > Probably not significantly, but yes, it looks less useful comparing to
> >> D74169.
> >> > Having only two files (executable and .dwp) looks significantly better
> >> than having executable and multiple .dwo files.
> >> > Having only one file(executable) with minimal size looks better than the
> >> two files with a bigger size.
> >> >
> >> > clang compiled with -gsplitdwarf takes 0.9G for executable and 0.9G for
> >> .dwp.
> >> > clang compiled with -gc-debuginfo takes only 0.76G for single executable.
> >> >
> >> >
> >> >
> >> > >>For the first point: we have a problem "Overlapping address ranges
> >> starting from 0"(D59553).
> >> >
> >> > >>We use custom solution, but the general solution like D74169 would be
> >> better here.
> >> >
> >> >
> >> > > If CU ranges are the only ones that need fixing, then I think the
> >> above solution might be as
> >> > > good/better - if more than CU ranges need fixing, then I think we
> >> might want to start talking about
> >> > > how to fix DWARF itself (split and non-split) to signal certain
> >> addresses point to dead code with a
> >> > > specific blessed value that linkers would need to implement - because
> >> with Split DWARF there's
> >> > > no way to solve the non-CU addresses at the linker.
> >> >
> >> > I think the worthful solution for that signal value would be LowPC >
> >> HighPC.
> >> > That does not require additional bits in DWARF.
> >> > It would be natural to skip such address ranges since they explicitly
> >> marked as invalid.
> >> > It could be implemented in a linker very easily. Probably, it would make
> >> sense to describe that
> >> > usage in DWARF standard.
> >> >
> >> > As to the addresses which are not seen by the linker(since they are in
> >> .dwo files) - yes,
> >> > they need to have another solution. Could you show an example of such a
> >> case, please?
> >> >
> >> >
> >> >
> >> > >>>2. Support of type units.
> >> >
> >> > >>>
> >> >
> >> > >>>>  That could be implemented further.
> >> >
> >> > >>>Enabling type units increases object size to make it easier to
> >> deduplicate at link time by a DWARF-unaware
> >> >
> >> > >>>linker. With a DWARF aware linker it'd be generally desirable not to
> >> have to add that object size overhead to
> >> >
> >> > >>>get the linking improvements.
> >> >
> >> > >>
> >> >
> >> > >>But, DWARFLinker should adequately work with type units since they are
> >> already implemented.
> >> >
> >> >
> >> > > Maybe - it'd be nice & all, but I don't think it's an outright
> >> necessity - if someone knows they're using
> >> > > a DWARF-aware linker, they'd probably not use type units in their
> >> object files. It's possible someone
> >> > > doesn't know for sure & maybe they have pre-canned debug object files
> >> from someone else, etc.
> >> >
> >> > I see.
> >> >
> >> > >>Another thing is that the idea behind type units has the potential to
> >> help Dwarf-aware linker to work faster.
> >> >
> >> > >>Currently, DWARFLinker analyzes context to understand whether types
> >> are the same or not.
> >> >
> >> >
> >> > >When you say "analyzes context" what do you mean? Usually I'd take that
> >> to mean
> >> > > "looks at things outside the type itself - like what namespace it's
> >> in, etc" - which, yes,
> >> > > it should do that, but it doesn't seem very expensive to do. But I
> >> guess you actually
> >> > > mean something about doing structural equivalence in some way, looking
> >> at things inside the type?
> >> >
> >> > I think it could be useful for both cases. Currently, dsymutil does only
> >> first thing
> >> > (look at type name, namespace name, etc..) and does not do the second
> >> thing
> >> > (doing structural equivalence). Analyzing type names is currently quite
> >> expensive
> >> > (the only search in string pool takes ~10 sec from 70 sec of overall
> >> time).
> >> > That is expensive because of many things should be done to work with
> >> strings:
> >> > parse DWARF, search and resolve relocations, compute a hash for strings,
> >> > put data into a string pool, create a fully qualified name(like
> >> namespace::function::name).
> >> > It looks like it could be optimized and finally require less time, but
> >> it still would be a noticeable
> >> > part of the overall time.
> >> >
> >> > If dsymutil starts to check for the structural equivalence, then the
> >> process would be even more slowly.
> >> > So, If instead of comparing types structure, there would be checked
> >> single hash-id - then this process
> >> > would also be faster.
> >> >
> >> > Thus I think using hash-id to compare types would allow to make current
> >> implementation faster and would
> >> > allow handling incomplete types by DWARFLinker without massive
> >> performance degradation also.
> >> >
> >> > >> But the context is known when types are generated. So, no need to
> >> spent the time analyzing it.
> >> >
> >> > >> If types could be compared without analyzing context, then
> >> Dwarf-aware linker would work faster.
> >> >
> >> > >> That is just an idea(not for immediate implementation): If types
> >> would be stored in some "type table"
> >> >
> >> > >> (instead of COMDAT section group) and could be accessed through
> >> hash-id(like type units
> >> >
> >> > >> - then it would be the solution requiring fewer bits to store but
> >> allowing to compare types
> >> >
> >> > >> by hash-id(not analysing context).
> >> > >> In this case, size increasing would be small. And processing time
> >> could be done faster.
> >> > >>
> >> > >> this is just an idea and could be discussed separately from the
> >> problem of integrating of D74169.
> >> >
> >> > >> >> 6. -flto=thin
> >> >
> >> > >> >>    That problem was described in this review
> >> https://reviews.llvm.org/D54747#1503720. It also exists in
> >> >
> >> > >> >> current DWARFLinker/dsymutil implementation. I think that problem
> >> should be discussed more: it could
> >> >
> >> > >> >> probably be fixed by avoiding generation of such incomplete
> >> declaration during thinlto,
> >> >
> >> > >> >> That would be costly to produce extra/redundant debug info in
> >> ThinLTO - actually ThinLTO could be doing
> >> >
> >> > >> >> more to reduce that redundancy early on (actually removing
> >> definitions from some llvm Modules if the type
> >> >
> >> > >> >> definition is known to exist in another Module, etc)
> >> > >> >I don't know if it's a problem since that patch was reverted.
> >> >
> >> > >>
> >> >
> >> > >> Yes. That patch was reverted, but this patch(D74169) has the same
> >> problem.
> >> >
> >> > >> if D74169 would be applied and --gc-debuginfo used then structure type
> >> > >> definition would be removed.
> >> >
> >> > >> DWARFLinker could handle that case - "removing definitions from some
> >> llvm Modules if the type
> >> > >> definition is known to exist in another Module".
> >> > >> i.e. DWARFLinker could replace the declaration with the definition.
> >> >
> >> > >> But that problem could be more easily resolved when debug info is
> >> generated(probably without
> >> > >> significant increase of debug info size):
> >> >
> >> > >> Here we have:
> >> >
> >> > >> DW_TAG_compile_unit(0x0000000b) - compile unit containing concrete
> >> instance for function "f".
> >> > >> DW_TAG_compile_unit(0x00000073) - compile unit containing abstract
> >> instance root for function "f".
> >> > >> DW_TAG_compile_unit(0x000000c1) - compile unit containing function
> >> "f" definition.
> >> >
> >> > >> Code for function "f" was deleted. gc-debuginfo deletes compile unit
> >> DW_TAG_compile_unit(0x000000c1)
> >> > >> containing "f" definition (since there is no corresponding code). But
> >> it has structure "Foo" definition
> >> > >> DW_TAG_structure_type(0x0000011e) referenced from
> >> DW_TAG_compile_unit(0x00000073)
> >> > >> by declaration DW_TAG_structure_type(0x000000ae). That declaration is
> >> exactly the case when definition
> >> > >> was removed by thinlto and replaced with declaration.
> >> >
> >> > >> Would it cost too much if type definition would not be replaced with
> >> declaration for "abstract instance root"?
> >> > >> The number of concrete instances is bigger than number of abstract
> >> instance roots.
> >> > >> Probably, it would not be too costly to leave definition in abstract
> >> instance root?
> >> >
> >> >
> >> >
> >> > >> Alternatively, Would it cost too much if type definition would not be
> >> replaced with declaration when
> >> > >> declaration references type from not used function? (lto could
> >> understand that concrete function is not used).
> >> >
> >> >
> >> > >I don't follow this example - could you provide a small concrete test
> >> case I could reproduce?
> >> >
> >> > I would provide a test case if necessary. But it looks like this issue
> >> is finally clear, and you already commented on that.
> >> >
> >> > > Oh, I guess this is happening perhaps because ThinLTO can't know for
> >> sure that a standalone
> >> > > definition of 'f' won't be needed - so it produces one in case one of
> >> the inlining opportunities
> >> > > doesn't end up inlining. Then it turns out all calls got inlined, so
> >> the external definition wasn't needed.
> >> >
> >> > > Oh, you're suggesting that these 3 CUs got emitted into one object
> >> file during LTO, but that DWARFLinker
> >> > > drops a CU without any code in it - even though... So far as I know,
> >> in LTO, LLVM directly references
> >> > > types across units if the CUs are all emitted in the same object file.
> >> (and if they weren't in the same
> >> > > object file - then the abstract_origin couldn't be pointing cross-CU).
> >> >
> >> > > I guess some basic things to say:
> >> >
> >> > > With ThinLTO, the concrete/standalone function definition is emitted
> >> in case some call sites don't end up
> >> > > being inlined. So we know it'll be emitted (but might not be needed by
> >> the actual linker)
> >> > > ANy number of inline calls might exist - but we shouldn't put the type
> >> information into those, because
> >> > > they aren't guaranteed to emit it (if the inline function gets
> >> optimized away, there would be nothing to
> >> > > enforce the type being emitted) - and even if we forced the type
> >> information to be emitted into one
> >> > > object file that has an inline copy of the function - there's no
> >> guarantee that object file will get linked in either.
> >> >
> >> > > So, no, I don't think there's much we can do to keep the size of
> >> object files down, while guaranteeing
> >> > > the type information will be emitted with the usual linker semantics.
> >> >
> >> > Then dsymutil/DWARFLinker could be changed to handle that(though it
> >> would probably be not very efficient).
> >> > If thinlto would understand that function is not used finally(and then
> >> must not contain referenced type definition),
> >> > then this situation could be handled more effectively.
> >> >
> >> > Thank you, Alexey.
> >> >
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> _______________________________________________
> >> >>> LLVM Developers mailing list
> >> >>> llvm-dev at lists.llvm.org
> >> >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> llvm-dev at lists.llvm.org
> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >>
>
> >_______________________________________________
> >LLVM Developers mailing list
> >llvm-dev at lists.llvm.org
> >https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev