[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

Thu Jun 4 17:11:59 PDT 2020

On 2020-06-04, Robinson, Paul via llvm-dev wrote:
>+ Ben Dunbobbin, whose name I take in vain below.
>He's my local expert on weird ELF features.

Hey, I have read
https://groups.google.com/forum/#!msg/generic-abi/A-1rbP8hFCA/EDA7Sf3KBwAJ
"monolithic input section handling" from Ben:)

>> -----Original Message-----
>> From: David Blaikie <dblaikie at gmail.com>
>> Sent: Thursday, June 4, 2020 2:43 PM
>> To: Robinson, Paul <paul.robinson at sony.com>
>> Cc: jh7370.2008 at my.bristol.ac.uk; llvm-dev at lists.llvm.org
>> Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info
>> in lld.
>>
>> On Thu, Jun 4, 2020 at 8:27 AM Robinson, Paul <paul.robinson at sony.com>
>> wrote:
>> >
>> >
>> >
>> > > -----Original Message-----
>> > > From: David Blaikie <dblaikie at gmail.com>
>> > > Sent: Wednesday, June 3, 2020 5:31 PM
>> > > To: Robinson, Paul <paul.robinson at sony.com>
>> > > Cc: jh7370.2008 at my.bristol.ac.uk; llvm-dev at lists.llvm.org
>> > > Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug
>> info
>> > > in lld.
>> > >
>> > > On Wed, Jun 3, 2020 at 6:34 AM Robinson, Paul <paul.robinson at sony.com>
>> > > wrote:
>> > > >
>> > > > DWARF was designed in an era when COMDAT and ICF were not a thing,
>> or at
>> > > least not common, certainly not when talking about function code.  The
>> > > overhead of a unit occurred only once per translation unit, so that
>> > > expense was reasonably amortized.
>> > > >
>> > > >
>> > > >
>> > > > Splitting functions into their own object-file sections and making
>> them
>> > > excludable is an evolution of compiler/linker technology that DWARF
>> has
>> > > not kept up with.  The linker-friendly solutions (COMDAT DWARF) would
>> put
>> > > function-related .debug_* contributions into a section-group along
>> with
>> > > the function .text itself; this multiplies the total number of
>> sections to
>> > > deal with, regardless of the tactics used for the content of each per-
>> > > function DWARF section.  The fully DWARF-conformant solution would
>> create
>> > > one partial_unit per function, with the corresponding overhead of unit
>> > > headers (especially painful in the .debug_line section).
>> Alternatively we
>> > > fragment DWARF into sections without headers and rely on the linker to
>> > > make everything look right in the linked executable; this produces .o
>> > > files that are not DWARF conformant (unless we can standardize this in
>> > > DWARF v6) and would be a big hassle for consumers other than the
>> linker.
>> > >
>> > > "object files don't contain DWARF, but they contain stuff that the
>> > > linker will turn into DWARF" wouldn't seem like the worst thing to me
>> > > - what sort of pre-linking parsing of DWARF use cases do you have in
>> > > mind, other than for our own compiler development uses?
>> >
>> > No, that wouldn't seem like the worst thing. Obviously llvm-dwarfdump
>> > would want to be able to report what's actually happening, but indeed
>> > all the other use-cases that come to mind are not looking at .o files.
>> >
>> > > (notwithstanding in-object Split DWARF (where the .dwo sections would
>> > > have to be remain usable without linking) or the MachO style debug
>> > > info distribution model which is similar)
>> >
>> > I expect Split DWARF would be incompatible with fragments.  I don't
>> > know details about MachO  but seems likely the same is true there.
>>
>> Yep, if they're sub-contribution regions, that wouldn't play well with
>> Split DWARF. (& full contribution isolation have the DWARF header
>> overhead, etc)
>>
>> I'd still be concerned about the ELF header overhead even of this
>> sub-contribution scheme, but could be interesting to see how it plays
>> out in practice.
>>
>> All that said, to avoid burying the lede here, I'll splice something
>> from the end up here:
>>
>> > Although the point is not to avoid tombstone values, but to do a more
>> efficient job of editing the final DWARF to omit gc'd functions; it's no
>> problem at all to use a tombstone value in .debug_addr IMO.
>>
>> But the tombstone values are Alexey's underlying issue (this ongoing
>> design discussion for over a year now) & /sort/ of mine too recently
>> (which, unfortunately, is what's reinvigoraetd this discussion -
>> would've been nice if I/we/someone had identified this sooner &
>> could've helped Alexey in a more timely manner): Alexey is dealing
>> with a platform where 0 is a valid address so the lld/gold strategy of
>> resolving relocations to dead code to "0+addend" creates ambiguous
>> DWARF. I'm dealing with a case of zero-length functions ("int f1() {
>> }" or "void f2() { __builtin_unreachable(); }") causing early
>> termination of DWARFv4 range lists.
>>
>> The reason for the DWARF-aware linker proposal was because the "let's
>> choose a better tombstone" discussion didn't go anywhere & people sort
>> of encouraged in this direction of "what if we didn't need a
>> tombstone/the linker fixed up the debug info instead". So if the DWARF
>> redundancy elimination doesn't address the issue of zero as a valid
>> address, it doesn't address Alexey's needs, unfortunately. :/
>
>But, upthread we had a tombstone discussion IIRC, which seemed to converge
>on "-1 except .debug_loc/.debug_ranges use -2" didn't it?  If we're still
>going on about having the linker rewriting DWARF, then the fragmenting
>idea is worth pursuing as an alternative to Alexey's current work.

+1 for "-1 except .debug_loc/.debug_ranges use -2"

>>
>> That said, I super appreciate the time you've put into writing this up
>> and it is valuable & I'd love to see some (even hand-crafted assembly)
>> prototypes, maybe do some back-of-the-envelope numbers to see whether
>> the ELF header overhead would be worth it, etc.
>
>It would be nice to verify that the section-fragment idea would produce
>something that looked usable.  Hand-written assembly... would require
>research into how to specify the right section attributes, but would
>likely be less effort than trying to make LLVM do something plausible.
>
>I'll see about creating an internal task for this.

According to Peter Smith, Arm Compiler 5 splits up DWARF v3 debugging
information and puts these sections into comdat groups:

"This approach did produce significantly more debug information than gcc
  did. For small microcontroller projects this wasn't a problem. For
  larger feature phone problems we had to put a lot of work into keeping
  the linker's memory usage down as many of our customers at the time were
  using 32-bit Windows machines with a default maximum virtual memory of 2Gb."

I'd also love to see some examples (even hand-crafted assembly).

>>
>> > > But even then, I'm not sure how viable it would be - as Fangrui
>> > > pointed out on another thread about this: ELF section overhead itself
>> > > is non-trivial ("sizeof(Elf64_Shdr) = 64.") & it would probably be
>> > > rather difficult to reconstruct header-less slice-and-dicable sections
>> > > in some cases. For type information (a reduced overhead version of
>> > > -fdebug-types-section) I could see it - but for functions, they need
>> > > to refer to addresses - preferably in the debug_addr section, and
>> > > that's accessed by index, so taking chunks out of it would break other
>> > > references to it, etc... adding the header would be expensive, and how
>> > > would the CU construct its DW_AT_ranges value if that has to be sliced
>> > > and diced? Again, some amount of linker magic might solve some of
>> > > these problems - but I think there's still a lot of overhead to making
>> > > a solution that's workable with a DWARF-agnostic linker (or even with
>> > > a DWARF aware one, but in an efficient amount of time/space where it's
>> > > not only usable for small programs, or for linking when you're
>> > > shipping a final production binary, etc)
>> >
>> > The idea we have blue-skied internally would work something like this
>> > (initially explicated in terms of the .debug_info section, then seeing
>> > how that tactic applies to other sections):
>> >
>> > There's a top fragment, containing the CU header and the CU DIE itself.
>> > Linker magic makes this first in the output file.
>>
>> Quick curiosity: Is there existing linker magic for this? What does it
>> look like? I'd love to know so I can play around with hand crafted
>> prototypes/keep it in mind for such things.
>
>Ben Dunbobbin did research into this some time ago, under the auspices
>of a "COMDAT DWARF" investigation. He's part of Sony's linker team, and
>it was a discussion with that team where I became convinced that the
>fragmenting idea was feasible using existing defined ELF capabilities,
>although perhaps in ways nobody had really taken advantage of.  It
>involved section groups and/or section ordering, but somebody much more
>familiar with ELF than I am would have to explain it. I've cc'd Ben.
>
>Regarding my discussion with our linker team:
>They asked me whether it was feasible to use sections to subset the
>DWARF, and I described the functional need (top & bottom fragments,
>arbitrary stuff in between) and they thought the ELF section-group
>and/or section-ordering features would be able to provide that.
>
>I'm not aware that anyone actually tried prototyping that.  The work
>that James did (mentioned upthread) IIRC was using COMDAT and full
>units with unit headers.  My fading memory suggests the discussion
>described just above was after that.
>
>>
>> (basically the ability for an object file to say "here's the start and
>> end of my contribution to this section, and some bits that /can/ go in
>> the middle, but you can drop them if you like")
>>
>> > Types also go here; certainly base types, and other file-scope types
>> > can be included here or put into type units.  (Type units aren't
>> > fragmented, they are their own thing same as always.)
>>
>> Separately, it might be worth considering putting types in such a
>> thing - but, yes, the "How do you reference them when they might be in
>> your unit or someone else's unit", etc, would have to be figured out.
>> I guess using an external symbol might be the solution there - again,
>> with a better understanding of the ^ mentioned linker magic, I'd
>> probably play around with hand crafting some examples just to see how
>> this could work.
>>
>> > There's a matching bottom fragment, which is just the terminating NULL
>> > for the CU DIE; linker magic makes this last in the output file.
>>
>> Last of all the contributions from this object file, not last in the
>> whole output file, right? (please excuse the pedantry, just double
>> checking)
>
>The object file would (loosely speaking) have a ".debug_info.first",
>some number of ".debug_info.excludable-middle", and a ".debug_info.last"
>which would all be glommed together in first-middle-last order in the
>output .debug_info section.  I believe I was told that this would be
>per-object-file, otherwise yeah it wouldn't work at all.
>
>This is why we need input from somebody who actually knows ELF.  😊

We probably have to reuse the ".debug_info" string (in assembly this requires
unique linkage, which has been implemented in LLVM for a while but relatively
new in binutils (future 2.35)) which is already an entry in .strtab, otherwise
the string itself can cost quite a lot.

(Mostly https://sourceware.org/pipermail/binutils/2020-May/111361.html )

>>
>> > Each function has its own fragment, which is in the same link-group
>> > (COMDAT or whatever) as the function's .text section; that way, if the
>> > function is discarded, so is the .debug_info fragment.  Offhand I can't
>> > think of any cases (other than DW_AT_specification, addressed below) of
>> > references to a subprogram DIE from elsewhere,
>>
>> The call_site DWARF would want to refer to a subprogram DIE, but that
>> could be handled by (first pass) having a declaration subprogram in
>> the initial fragment that the call_site could refer to using the usual
>> assembler-resolved CU-relative offset. Of course that'd mean a bunch
>> of (probably the bigger part) of the function's DWARF footprint
>> wouldn't be deduplicated, but would address this part of the address
>> tombstone issue (if not using debug_addr) & reduce some of the DWARF -
>> the addresses are pretty big (if you're not pooling them), etc.
>
>Ah, forgot about call_site.  Yeah referring to a declaration should work.
>
>>
>> > so it should be fine to
>> > discard the entire function fragment as needed.  Linker magic puts all
>> > function fragments between the top and bottom fragments, in some
>> > indeterminate order.  Each function fragment is the usual complete
>> > subtree, rooted in DW_TAG_subprogram.
>>
>> Rooted at the top level (well, below the DW_TAG_compile_unit) DIE, as
>> you mention later - namespace, or whatever else.
>
>Right, each fragment would be a complete subtree that would ordinarily
>be a direct child of DW_TAG_compile_unit.  With whatever DIE it needed.
>
>>
>> >  References to types are either
>> > to type units as normal, or to types in the top fragment.  Note that
>> > these references do not require relocations; type units are by signature
>> > as always, and for types in the top fragment, the offsets into the top
>> > fragment are known at compile time.
>> >
>> > Inlined functions are described as part of the function they have been
>> > inlined into, being children of the function DIE.  DW_AT_specification
>> > refers to the abstract declaration which is in its own fragment (or the
>> > top fragment, but that keeps the declaration from being elided if all
>> > references go away).
>>
>> Yep, this overlaps with the call_site stuff I mentioned earlier - same
>> ideas. Either top fragment, or its own fragment. Keeping its own
>> fragment alive, and figuring out how to reference it (depending on
>> fragment layout/elision) would require some work, but I think it's
>> do-able. Might even be do-able so it can be deduplicated across CUs
>> (use a sec_offset form, use a linker-resolved relocation to it) - this
>> infrastructure would overlap with type deduplication without type
>> units too.
>>
>> Though linker resolved relocations add more bytes...
>>
>> > If functions are inside namespaces, each function fragment will need
>> > to have namespace DIEs around the function DIE.  This adds overhead
>> > but it's pretty small.
>> >
>> > I hand-wave filling in the CU header's unit length.  I'd expect a
>> > relocation with a reference to the bottom fragment should be able to
>> > compute the correct value.
>>
>> *nod*
>>
>> > That's the story for .debug_info; what about other sections?
>> >
>> > Sections referenced by index from .debug_info can't be fragmented;
>> > this would be: .debug_abbrev, .debug_addr, .debug_str_offsets.
>> >
>> > .debug_str doesn't need to be fragmented, linkers DTRT already.
>>
>> (linkers deduplicate debug_str - but can they be made to remove
>> unreferenced strings too? in that cas ewe'd have an interesting
>> tradeoff of maybe using FORM_strp rather than strx - if we wanted the
>> linker to be able to drop strings from dropped function definitions,
>> etc)
>
>Future refinements are quite possible!
>
>>
>> > .debug_macro contents are not tied to functions and won't be fragmented.
>> >
>> > .debug_loclists and .debug_rnglists should be fragmentable the same
>> > way as .debug_info; they exist only as extensions of .debug_info, and
>> > the range list for the CU itself is merely a concatenated set of
>> > contributions from each constituent function, so that should Just Work
>> > (although it won't be optimal, adjacent ranges won't be coalesced).
>>
>> At least the way we currently emit loclists and rnglists is by using
>> an index (the header of loclists and rnglists has an index to offset
>> mapping) - like strx, this would make it hard/impossible for a
>> DWARF-agnostic linker to see through to find out which indexes were
>> actually used. We could potentially not use the loclistx/rnglistx
>> forms/indexes from fragments - instead using sec_offsets that would
>> make them relocatable/removable/etc. (so long as all the index-based
>> referenced lists came in the debug_loclist/debug_rnglist header
>> fragment)
>
>Ah, I hadn't looked at how we do those lists.  But sounds solvable.
>
>>
>> > I believe the same is true for .debug_loc and .debug_ranges, although
>> > I haven't checked.
>>
>> Yep, those ones are easier - there's no contribution header, they can
>> only be referenced via sec_offset, so slicing and dicing them is
>> cheap.
>>
>> But the tombstone problem still exists for the CU's debug_ranges -
>> though /maybe/ it could be carefully constructed from fragments...
>> that's going to be a /lot/ of sections in the end though.
>>
>> > .debug_aranges is functionally equivalent to the CU rangelist.
>>
>> Yup. (as we've touched on before, we don't use aranges at Google -
>> instead relying on CU's ranges which are just a little more expensive
>> to retrieve - but no need to duplicate the data in both places - if
>> consumers really find the aranges worthwhile to avoid parsing a few
>> attributes on the CU DIE, perhaps a future spec could let
>> debug_aranges reference a range list? so that aranges and the CU could
>> share the same data?)
>>
>> > .debug_line can work the same way as .debug_info but is worth a word.
>> > The top fragment has the header, including the directory/file lists
>> > because those are referenced by index.  DW_LNE_define_file can't be
>> > used.  Each function has a fragment containing the sequence for that
>> > function, starting with set_address and ending with end_sequence.
>> > The bottom fragment is empty, existing only to allow the length to
>> > be computed.
>>
>> Yep - can't remove dead file and directory names, unfortunately - and
>> the line table's pretty compact, so not sure it'd be a great savings
>> (especially compared to the ELF section overhead - at the object file
>> size at least (though probably a small win for linked executable
>> size)). Chances are those strings (now in debug_line_str) would be
>> used /somewhere/ in the program, so linker string deduplication would
>> get most of the wins - just dead offset entries in the line table
>> header.
>
>Sony does squeeze out the sequences for dead functions; I think it's
>not a huge win, in terms of total debug info size, but the .debug_line
>section does not let you skip dead sequences; you still have to parse
>the whole thing.  Our debugger guys were pleased at not having to
>spend time doing something that useless.  (Yeah it does mean the
>linker has to parse the whole .debug_line section; but our theory is
>that you probably run the debugger more than you run the linker, and
>in any case you do it interactively, so debugger load time is probably
>more annoying than some fractional increase in build/link time.)
>
>The dir/file tables can't be squeezed, but one expects it's not a
>huge cost with .debug_line_str having lots of deduplication
>opportunities.
>>
>> > .debug_line_str is a string section and requires nothing special.
>> >
>> > .debug_names ... haven't looked at it but I suspect either it doesn't
>> > survive or it has to be generated post-link (or by the linker).
>>
>> Generally you're going to want a DWARF-aware linker for debug_names,
>> same as gdb-index, etc.
>>
>> > .debug_frame I *think* can be fragmented, but I haven't take the
>> > time to look at it to make sure.
>> >
>> > Those are all the sections I see in DWARF v5 Appendix B.
>> >
>> > So that's the blue-sky vision of linker-magic COMDAT DWARF, which
>> > took me about an hour to write down just now.  There is certainly
>> > a non-trivial overhead in terms of ELF sections; in the general
>> > case we would have 5 per-function fragments (for .debug_info,
>> > .debug_line, .debug_rnglists, .debug_loclists, .debug_aranges).
>> >
>> > Not small, but then other features in the works are using huge
>> > quantities of ELF sections too (section-per-basic-block).
>>
>> That work's being scoped to be fairly selective about which basic
>> blocks it puts in unique sections - just those that are especially
>> performance sensitive, so the cost isn't as high as you might
>> otherwise imagine. Adding 5 new sections per function would be
>> probably a significantly larger growth than anything else I'm aware
>> of, but I haven't run the numbers by any means.
>
>Doing it for *every* function would be the worst case, for when
>you're trying to squeeze everything (gc + icf).  We could likely
>get wins if we did it just for the functions that today end up in
>a COMDAT section (inline functions, template instantiations) which
>previous research has found to be pretty significant (and major
>motivation for the Program Repository work that we've previously
>described at a Dev Meeting, https://llvm.org/devmtg/2016-11/#talk22)
>
>>
>> Thanks again for the write up!
>
>NP, it was fun to trot out this stuff.
>--paulr
>
>>
>> - Dave
>>
>> > > & as always, not sure how any of this would work for Split DWARF -
>> > > just a debug_adr section that has some addresses that point to
>> > > discardable functions... if we want those addresses themselves to be
>> > > discardable (so we don't have to use a tombstone value inserted by the
>> > > linker) then they'd need to be in separate debug_addr contributions
>> > > with headers, etc - the overhead just seems too high to me in all the
>> > > ways I can look at that.
>> >
>> > Yeah I think .dwo sections can't take advantage of fragmenting, and
>> > .debug_addr is referenced by index so it can't be fragmented.  Although
>> > the point is not to avoid tombstone values, but to do a more efficient
>> > job of editing the final DWARF to omit gc'd functions; it's no problem
>> > at all to use a tombstone value in .debug_addr IMO.
>> > --paulr
>> >
>> > >
>> > > >
>> > > >
>> > > >
>> > > > Or we pay the cost of parsing, trimming, and rewriting all the DWARF
>> in
>> > > the linker.
>> > > >
>> > > > --paulr
>> > > >
>> > > >
>> > > >
>> > > > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of James
>> > > Henderson via llvm-dev
>> > > > Sent: Wednesday, June 3, 2020 3:48 AM
>> > > > To: David Blaikie <dblaikie at gmail.com>
>> > > > Cc: llvm-dev at lists.llvm.org
>> > > > Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete
>> debug
>> > > info in lld.
>> > > >
>> > > >
>> > > >
>> > > > It makes me sad that the linker (via a library or otherwise) has to
>> be
>> > > "DWARF-aware" to be able to effectively handle --gc-sections, COMDATs,
>> --
>> > > icf etc for debug info, without leaving large blocks of data kicking
>> > > around.
>> > > >
>> > > >
>> > > >
>> > > > The patching to -1 (or equivalent) is probably a good lightweight
>> > > solution (though I'd love it if it could be done based on section type
>> in
>> > > the future rather than section name, but that's probably outside the
>> realm
>> > > of DWARF), as it requires only minimal understanding in the linker,
>> but
>> > > anything beyond that seems to be complicated logic that is mostly due
>> to
>> > > the structure of DWARF. Patching to -1 does feel a bit like a sticking
>> > > plaster/band aid to patch over the issue rather than properly solving
>> it
>> > > too - there will still be debug data (potentially significant amounts
>> in
>> > > COMDAT-heavy objects) that the linker has to write and the debugger
>> has to
>> > > somehow know how to skip (even if it knows that -1 is special-case due
>> to
>> > > the standard being updated, it needs to get as far as the -1), which
>> is
>> > > all wasted effort.
>> > > >
>> > > >
>> > > >
>> > > > We've already seen from Alexey's prototyping, and from our own
>> > > experiences with the Sony proprietary linker (which tried to rewrite
>> > > .debug_line only) that deconstructing the DWARF so that it can be more
>> > > optimally reassembled at link time is slow going, and will probably
>> > > inevitably be however much effort is put into optimising it. For a
>> start,
>> > > given the current standards, it's impossible to know how to
>> deconstruct it
>> > > without having to parse vast amounts of DWARF, which is typically
>> going to
>> > > mean a lot more parsing work than the linker would normally have to
>> deal
>> > > with. Additionally, much of this parsing work is wasted effort, since
>> it
>> > > seems unlikely in many links that large amounts of the DWARF will be
>> > > redundant. Having an option to opt-in doesn't help much there, since
>> it
>> > > just means the logic exists without most people using it, due to it
>> not
>> > > being good enough, or potentially they don't even know it exists.
>> > > >
>> > > >
>> > > >
>> > > > I don't have particularly concrete suggestions as to how to solve
>> the
>> > > structural problems with DWARF at this point. The only thing that
>> seems
>> > > obvious to me is a more "blessed" approach to fragmentation of
>> sections,
>> > > similar to what I tried with my prototype mentioned earlier in the
>> thread,
>> > > although we'd need to figure out the previously stated performance
>> issues.
>> > > Other ideas might tie into this, like somehow sharing the various
>> table
>> > > headers a bit like CIEs in .eh_frame that could be merged by the
>> linker -
>> > > each object could have separate table header sections, which are
>> > > referenced by the individual .debug_* blocks, which in turn are one
>> per
>> > > function/data piece and easily discardable/merged by the linker.
>> > > >
>> > > >
>> > > >
>> > > > Just some thoughts.
>> > > >
>> > > >
>> > > >
>> > > > James
>> > > >
>> > > >
>> > > >
>> > > > On Tue, 2 Jun 2020 at 19:24, David Blaikie via llvm-dev <llvm-
>> > > dev at lists.llvm.org> wrote:
>> > > >
>> > > > On Tue, May 19, 2020 at 7:17 AM Alexey Lapshin
>> > > > <alapshin at accesssoftek.com> wrote:
>> > > > >
>> > > > > Hi David, please find my comments inside:
>> > > > >
>> > > > >
>> > > > > >>>Broad question: Do you have any specific motivation/users/etc
>> in
>> > > implementing this (if you can speak about it)?
>> > > > >
>> > > > > >>> - it might help motivate the work, understand what tradeoffs
>> might
>> > > be suitable for you/your users, etc.
>> > > > >
>> > > > > >>There are two general requirements:
>> > > > > >> 1) Remove (or clean) invalid debug info.
>> > > > >
>> > > > > >
>> > > > > >Perhaps a simpler direct solution for your immediate needs might
>> be a
>> > > much narrower,
>> > > > > >and more efficient linker-DWARF-awareness feature:
>> > > > > >
>> > > > > > With DWARFv5, rnglists present an opportunity for a DWARF linker
>> to
>> > > rewrite the ranges
>> > > > > > without parsing the rest of the DWARF. /technically/ this isn't
>> > > guaranteed - rnglist entries
>> > > > > > can be referenced either directly, or by index. If all rnglists
>> are
>> > > referenced by index, then
>> > > > > > a linker could parse only the debug_rnglists section and rewrite
>> > > ranges to remove any
>> > > > > > address ranges that refer to optimized-out code.
>> > > > > >
>> > > > > > This would only be correct for rnglists that had no direct
>> > > references to them (that only were
>> > > > > > referenced via the indexes) - but we could either implement it
>> with
>> > > that assumption, or could
>> > > > > > add an LLVM extension attribute on the CU that would say "I
>> promise
>> > > I only referenced rnglists
>> > > > > > via rnglistx forms/indexes). If this DWARF-aware linking would
>> have
>> > > to read the CU DIE (not
>> > > > > > all the other DIEs) it /could/ also then rewrite high/low_pc if
>> the
>> > > CU wasn't using ranges...
>> > > > > > but that wouldn't come up in the function-removal case, because
>> then
>> > > you'd have ranges anyway,
>> > > > > > so no need for that.
>> > > > > >
>> > > > > > Such a DWARF-aware rnglist linking could also simplify rnglists,
>> in
>> > > cases where functions
>> > > > > > ended up being laid out next to each other, the linker could
>> > > coalesce their ranges together.
>> > > > > >
>> > > > > > I imagine this could be implemented with very little overhead to
>> > > linking, especially compared
>> > > > > > to the overhead of full DWARF-aware linking.
>> > > > > >
>> > > > > >Though none of this fixes Split DWARF, where the linker doesn't
>> get a
>> > > chance to see the
>> > > > > > addresses being used - but if you only want/need the CU-level
>> ranges
>> > > to be correct, this
>> > > > > > might be a viable fix, and quite efficient.
>> > > > >
>> > > > > Yes, we think about that alternative. This would resolve our
>> problem
>> > > of invalid debug info
>> > > > > and would work much faster. Thus, if we would not have good
>> results
>> > > for D74169 then we
>> > > > > will implement it. Do you think it could be useful to have this
>> > > solution in upstream?
>> > > >
>> > > > A pure rnglist rewriting - I think it'd be OK to have in upstream -
>> > > > again, cost/benefit/etc would have to be weighed. I'm not sure it
>> > > > would save enough space to be particularly valuable beyond the
>> > > > correctness issue - and it doesn't completely solve the correctness
>> > > > issue for zero-address usage or low-address usage (because you could
>> > > > still have overlapping subprograms inside a CU - so if you were
>> > > > symbolizing you could use the correct rnglist to filter, but then go
>> > > > look inside the CU only to find two subprograms that had that
>> address
>> > > > & not know which one was the correct one an which one was the
>> > > > discarded one).
>> > > >
>> > > > rnglist rewriting might be easy enough to prototype - but depends
>> what
>> > > > you want to spend your time on, I know this whole issue has been a
>> > > > huge investment of your time already - but maybe this recent
>> > > > revitalization of the conversation around having an explicit value
>> in
>> > > > the linker might be sufficient to address everyone's needs...
>> *fingers
>> > > > crossed*)
>> > > >
>> > > >
>> > > > > >> 2) Optimize the DWARF size.
>> > > > >
>> > > > >
>> > > > > > Do your users care much about this? I imagine if they had
>> > > significant DWARF size issues,
>> > > > > > they'd have significant link time issues and the kind of cost to
>> > > link time this feature has would
>> > > > > > be prohibitive - but perhaps they're sharing linked binaries
>> much
>> > > more often than they're
>> > > > > > actually performing linking.
>> > > > >
>> > > > > Yes, they do. They also have significant link-time issues.
>> > > > > So current performance results of D74169 are not very acceptable.
>> > > > > We hope to improve it.
>> > > > >
>> > > > >
>> > > > >
>> > > > > >>The specifics which our users have:
>> > > > > >>  - embedded platform which uses 0 as start of .text section.
>> > > > > >>  - custom toolset which does not support all features yet(f.e.
>> > > split dwarf).
>> > > > > >>  - tolerant of the link-time increase.
>> > > > > >>  - need a useful way to share debug builds.
>> > > > >
>> > > > >
>> > > > > > Sharing two files (executable and dwp) is significantly less
>> useful
>> > > than sharing one file?
>> > > > >
>> > > > > Probably not significantly, but yes, it looks less useful
>> comparing to
>> > > D74169.
>> > > > > Having only two files (executable and .dwp) looks significantly
>> better
>> > > than having executable and multiple .dwo files.
>> > > > > Having only one file(executable) with minimal size looks better
>> than
>> > > the two files with a bigger size.
>> > > > >
>> > > > > clang compiled with -gsplitdwarf takes 0.9G for executable and
>> 0.9G
>> > > for .dwp.
>> > > > > clang compiled with -gc-debuginfo takes only 0.76G for single
>> > > executable.
>> > > > >
>> > > > >
>> > > > >
>> > > > > >>For the first point: we have a problem "Overlapping address
>> ranges
>> > > starting from 0"(D59553).
>> > > > >
>> > > > > >>We use custom solution, but the general solution like D74169
>> would
>> > > be better here.
>> > > > >
>> > > > >
>> > > > > > If CU ranges are the only ones that need fixing, then I think
>> the
>> > > above solution might be as
>> > > > > > good/better - if more than CU ranges need fixing, then I think
>> we
>> > > might want to start talking about
>> > > > > > how to fix DWARF itself (split and non-split) to signal certain
>> > > addresses point to dead code with a
>> > > > > > specific blessed value that linkers would need to implement -
>> > > because with Split DWARF there's
>> > > > > > no way to solve the non-CU addresses at the linker.
>> > > > >
>> > > > > I think the worthful solution for that signal value would be LowPC
>> >
>> > > HighPC.
>> > > > > That does not require additional bits in DWARF.
>> > > > > It would be natural to skip such address ranges since they
>> explicitly
>> > > marked as invalid.
>> > > > > It could be implemented in a linker very easily. Probably, it
>> would
>> > > make sense to describe that
>> > > > > usage in DWARF standard.
>> > > > >
>> > > > > As to the addresses which are not seen by the linker(since they
>> are in
>> > > .dwo files) - yes,
>> > > > > they need to have another solution. Could you show an example of
>> such
>> > > a case, please?
>> > > > >
>> > > > >
>> > > > >
>> > > > > >>>2. Support of type units.
>> > > > >
>> > > > > >>>
>> > > > >
>> > > > > >>>>  That could be implemented further.
>> > > > >
>> > > > > >>>Enabling type units increases object size to make it easier to
>> > > deduplicate at link time by a DWARF-unaware
>> > > > >
>> > > > > >>>linker. With a DWARF aware linker it'd be generally desirable
>> not
>> > > to have to add that object size overhead to
>> > > > >
>> > > > > >>>get the linking improvements.
>> > > > >
>> > > > > >>
>> > > > >
>> > > > > >>But, DWARFLinker should adequately work with type units since
>> they
>> > > are already implemented.
>> > > > >
>> > > > >
>> > > > > > Maybe - it'd be nice & all, but I don't think it's an outright
>> > > necessity - if someone knows they're using
>> > > > > > a DWARF-aware linker, they'd probably not use type units in
>> their
>> > > object files. It's possible someone
>> > > > > > doesn't know for sure & maybe they have pre-canned debug object
>> > > files from someone else, etc.
>> > > > >
>> > > > > I see.
>> > > > >
>> > > > > >>Another thing is that the idea behind type units has the
>> potential
>> > > to help Dwarf-aware linker to work faster.
>> > > > >
>> > > > > >>Currently, DWARFLinker analyzes context to understand whether
>> types
>> > > are the same or not.
>> > > > >
>> > > > >
>> > > > > >When you say "analyzes context" what do you mean? Usually I'd
>> take
>> > > that to mean
>> > > > > > "looks at things outside the type itself - like what namespace
>> it's
>> > > in, etc" - which, yes,
>> > > > > > it should do that, but it doesn't seem very expensive to do. But
>> I
>> > > guess you actually
>> > > > > > mean something about doing structural equivalence in some way,
>> > > looking at things inside the type?
>> > > > >
>> > > > > I think it could be useful for both cases. Currently, dsymutil
>> does
>> > > only first thing
>> > > > > (look at type name, namespace name, etc..) and does not do the
>> second
>> > > thing
>> > > > > (doing structural equivalence). Analyzing type names is currently
>> > > quite expensive
>> > > > > (the only search in string pool takes ~10 sec from 70 sec of
>> overall
>> > > time).
>> > > > > That is expensive because of many things should be done to work
>> with
>> > > strings:
>> > > > > parse DWARF, search and resolve relocations, compute a hash for
>> > > strings,
>> > > > > put data into a string pool, create a fully qualified name(like
>> > > namespace::function::name).
>> > > > > It looks like it could be optimized and finally require less time,
>> but
>> > > it still would be a noticeable
>> > > > > part of the overall time.
>> > > > >
>> > > > > If dsymutil starts to check for the structural equivalence, then
>> the
>> > > process would be even more slowly.
>> > > > > So, If instead of comparing types structure, there would be
>> checked
>> > > single hash-id - then this process
>> > > > > would also be faster.
>> > > > >
>> > > > > Thus I think using hash-id to compare types would allow to make
>> > > current implementation faster and would
>> > > > > allow handling incomplete types by DWARFLinker without massive
>> > > performance degradation also.
>> > > > >
>> > > > > >> But the context is known when types are generated. So, no need
>> to
>> > > spent the time analyzing it.
>> > > > >
>> > > > > >> If types could be compared without analyzing context, then
>> Dwarf-
>> > > aware linker would work faster.
>> > > > >
>> > > > > >> That is just an idea(not for immediate implementation): If
>> types
>> > > would be stored in some "type table"
>> > > > >
>> > > > > >> (instead of COMDAT section group) and could be accessed through
>> > > hash-id(like type units
>> > > > >
>> > > > > >> - then it would be the solution requiring fewer bits to store
>> but
>> > > allowing to compare types
>> > > > >
>> > > > > >> by hash-id(not analysing context).
>> > > > > >> In this case, size increasing would be small. And processing
>> time
>> > > could be done faster.
>> > > > > >>
>> > > > > >> this is just an idea and could be discussed separately from the
>> > > problem of integrating of D74169.
>> > > > >
>> > > > > >> >> 6. -flto=thin
>> > > > >
>> > > > > >> >>    That problem was described in this review
>> > >
>> https://urldefense.com/v3/__https://reviews.llvm.org/D54747*1503720__;Iw!!
>> > >
>> JmoZiZGBv3RvKRSx!q8U1OiuTHDnORPTzJINrJOLwncHMDEAyE45t99RrMdkDdSYLjh78mgJen
>> > > L-N0pxHMQ$ . It also exists in
>> > > > >
>> > > > > >> >> current DWARFLinker/dsymutil implementation. I think that
>> > > problem should be discussed more: it could
>> > > > >
>> > > > > >> >> probably be fixed by avoiding generation of such incomplete
>> > > declaration during thinlto,
>> > > > >
>> > > > > >> >> That would be costly to produce extra/redundant debug info
>> in
>> > > ThinLTO - actually ThinLTO could be doing
>> > > > >
>> > > > > >> >> more to reduce that redundancy early on (actually removing
>> > > definitions from some llvm Modules if the type
>> > > > >
>> > > > > >> >> definition is known to exist in another Module, etc)
>> > > > > >> >I don't know if it's a problem since that patch was reverted.
>> > > > >
>> > > > > >>
>> > > > >
>> > > > > >> Yes. That patch was reverted, but this patch(D74169) has the
>> same
>> > > problem.
>> > > > >
>> > > > > >> if D74169 would be applied and --gc-debuginfo used then
>> structure
>> > > type
>> > > > > >> definition would be removed.
>> > > > >
>> > > > > >> DWARFLinker could handle that case - "removing definitions from
>> > > some llvm Modules if the type
>> > > > > >> definition is known to exist in another Module".
>> > > > > >> i.e. DWARFLinker could replace the declaration with the
>> definition.
>> > > > >
>> > > > > >> But that problem could be more easily resolved when debug info
>> is
>> > > generated(probably without
>> > > > > >> significant increase of debug info size):
>> > > > >
>> > > > > >> Here we have:
>> > > > >
>> > > > > >> DW_TAG_compile_unit(0x0000000b) - compile unit containing
>> concrete
>> > > instance for function "f".
>> > > > > >> DW_TAG_compile_unit(0x00000073) - compile unit containing
>> abstract
>> > > instance root for function "f".
>> > > > > >> DW_TAG_compile_unit(0x000000c1) - compile unit containing
>> function
>> > > "f" definition.
>> > > > >
>> > > > > >> Code for function "f" was deleted. gc-debuginfo deletes compile
>> > > unit DW_TAG_compile_unit(0x000000c1)
>> > > > > >> containing "f" definition (since there is no corresponding
>> code).
>> > > But it has structure "Foo" definition
>> > > > > >> DW_TAG_structure_type(0x0000011e) referenced from
>> > > DW_TAG_compile_unit(0x00000073)
>> > > > > >> by declaration DW_TAG_structure_type(0x000000ae). That
>> declaration
>> > > is exactly the case when definition
>> > > > > >> was removed by thinlto and replaced with declaration.
>> > > > >
>> > > > > >> Would it cost too much if type definition would not be replaced
>> > > with declaration for "abstract instance root"?
>> > > > > >> The number of concrete instances is bigger than number of
>> abstract
>> > > instance roots.
>> > > > > >> Probably, it would not be too costly to leave definition in
>> > > abstract instance root?
>> > > > >
>> > > > >
>> > > > >
>> > > > > >> Alternatively, Would it cost too much if type definition would
>> not
>> > > be replaced with declaration when
>> > > > > >> declaration references type from not used function? (lto could
>> > > understand that concrete function is not used).
>> > > > >
>> > > > >
>> > > > > >I don't follow this example - could you provide a small concrete
>> test
>> > > case I could reproduce?
>> > > > >
>> > > > > I would provide a test case if necessary. But it looks like this
>> issue
>> > > is finally clear, and you already commented on that.
>> > > > >
>> > > > > > Oh, I guess this is happening perhaps because ThinLTO can't know
>> for
>> > > sure that a standalone
>> > > > > > definition of 'f' won't be needed - so it produces one in case
>> one
>> > > of the inlining opportunities
>> > > > > > doesn't end up inlining. Then it turns out all calls got
>> inlined, so
>> > > the external definition wasn't needed.
>> > > > >
>> > > > > > Oh, you're suggesting that these 3 CUs got emitted into one
>> object
>> > > file during LTO, but that DWARFLinker
>> > > > > > drops a CU without any code in it - even though... So far as I
>> know,
>> > > in LTO, LLVM directly references
>> > > > > > types across units if the CUs are all emitted in the same object
>> > > file. (and if they weren't in the same
>> > > > > > object file - then the abstract_origin couldn't be pointing
>> cross-
>> > > CU).
>> > > > >
>> > > > > > I guess some basic things to say:
>> > > > >
>> > > > > > With ThinLTO, the concrete/standalone function definition is
>> emitted
>> > > in case some call sites don't end up
>> > > > > > being inlined. So we know it'll be emitted (but might not be
>> needed
>> > > by the actual linker)
>> > > > > > ANy number of inline calls might exist - but we shouldn't put
>> the
>> > > type information into those, because
>> > > > > > they aren't guaranteed to emit it (if the inline function gets
>> > > optimized away, there would be nothing to
>> > > > > > enforce the type being emitted) - and even if we forced the type
>> > > information to be emitted into one
>> > > > > > object file that has an inline copy of the function - there's no
>> > > guarantee that object file will get linked in either.
>> > > > >
>> > > > > > So, no, I don't think there's much we can do to keep the size of
>> > > object files down, while guaranteeing
>> > > > > > the type information will be emitted with the usual linker
>> > > semantics.
>> > > > >
>> > > > > Then dsymutil/DWARFLinker could be changed to handle that(though
>> it
>> > > would probably be not very efficient).
>> > > > > If thinlto would understand that function is not used finally(and
>> then
>> > > must not contain referenced type definition),
>> > > > > then this situation could be handled more effectively.
>> > > > >
>> > > > > Thank you, Alexey.
>> > > > >
>> > > > >>>
>> > > > >>>
>> > > > >>>
>> > > > >>>
>> > > > >>> _______________________________________________
>> > > > >>> LLVM Developers mailing list
>> > > > >>> llvm-dev at lists.llvm.org
>> > > > >>> https://urldefense.com/v3/__https://lists.llvm.org/cgi-
>> > > bin/mailman/listinfo/llvm-
>> > >
>> dev__;!!JmoZiZGBv3RvKRSx!q8U1OiuTHDnORPTzJINrJOLwncHMDEAyE45t99RrMdkDdSYLj
>> > > h78mgJenL-Oh8zYPg$
>> > > > _______________________________________________
>> > > > LLVM Developers mailing list
>> > > > llvm-dev at lists.llvm.org
>> > > > https://urldefense.com/v3/__https://lists.llvm.org/cgi-
>> > > bin/mailman/listinfo/llvm-
>> > >
>> dev__;!!JmoZiZGBv3RvKRSx!q8U1OiuTHDnORPTzJINrJOLwncHMDEAyE45t99RrMdkDdSYLj
>> > > h78mgJenL-Oh8zYPg$
>_______________________________________________
>LLVM Developers mailing list
>llvm-dev at lists.llvm.org
>https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev