[llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete debug info in lld.

Wed Jun 24 08:15:39 PDT 2020

Thanks for copying me in Paul! Sorry, for the late reply.

I have had a personal interest in this subject for a long time and I have
had discussions on linking DWARF with many of you in person at LLVM events.
I don't have much to add to what's upthread and James Henderson has already
answered the questions I was copied in for. However, I did want to make a
general point about ELF that I thought might be helpful having read through
the above:

It may be that changes to the ELF spec might be warranted in order to
improve DWARF linking (on ELF). Everyone may already be aware of this but
FYI: the ownership of ELF has been a problem for a while. Currently, the
ELF spec is composed of the spec here: http://www.sco.com/developers/gabi/
plus the decisions made on the list:
https://groups.google.com/forum/#!forum/generic-abi. Obviously, this is
less than ideal. However, this is hopefully changing in the near future
see: https://groups.google.com/forum/#!topic/generic-abi/9OO5vhxb00Y. This
should mean that modifications to ELF are more viable in the future.

On Tue, Jun 23, 2020 at 10:42 PM David Blaikie via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> On Tue, Jun 23, 2020 at 2:19 PM Alexey Lapshin
> <alapshin at accesssoftek.com> wrote:
> >
> > >On 2020-06-22, Alexey Lapshin via llvm-dev wrote:
> > >>>> >> >Alexey> Probably we could try to make DWARF easy to parsing,
> trimming, rewriting so that full DWARF
> > >>>> >> >Alexey> parsing solution would not take too much time?
> > >>>> >> >Alexey>
> > >>>> >> >Alexey> f.e. -debug-types-section solution uses COMDAT sections
> to split and deduplicate types.
> > >>>> >> >Alexey> That solution works quite fast. It has already
> mentioned drawback with a big size
> > >>>> >> >Alexey> overhead(because of section headers/type unit headers
> sizes). But, the fact that type units
> > >>>> >> >Alexey> could be identified just by hash-id(without parsing
> type names and types hierarchies)
> > >>>> >> >Alexey> allows the linker to reject duplications quickly.
> Another thing is that the linker drops
> > >>>> >> >Alexey> duplicated COMDAT sections without any additional
> check. After duplications are deleted,
> > >>>> >> >Alexey> the debug info is still consistent.
> > >>>> >> >Alexey> There could be done DWARF aware solution working using
> the same two principles:
> > >>>> >> >Alexey> 1. compare types by hash-id.
> > >>>> >> >Alexey> 2. drop duplications without analyzing contents.
> > >>>> >> >Alexey>
> > >>>> >> >Alexey> If all types are put into a separate type table and
> have hash-id, then it would be much easier to
> > >>>> >> >Alexey> deduplicate them. The idea demonstrated here -
> https://reviews.llvm.org/P8164. (It still has a
> > >>>> >> >Alexey> questions: whether base types should be put into type
> table, whether references into type table
> > >>>> >> >Alexey> should be done by DW_AT_signature or just by offset,
> etc.. ) While handling that separate type table
> > >>>> >> >Alexey> the DWARF aware linker would check the only hash_id and
> put only one type description
> > >>>> >> >Alexey> with the same id in the final type table. It also would
> allow us to solve that -flto=thin problem -
> > >>>> >> >Alexey>
> http://lists.llvm.org/pipermail/llvm-dev/2020-May/141938.html (there is
> dsymutil example there).
> > >>>> >> >Alexey> i.e., the case when type definition would be removed
> will not occur.
> > >>>> >>
> > >>>> >> David>I think there is scope for lower-overhead type
> deduplication,
> > >>>> >> David>especially now with type units being merged into the
> debug_info
> > >>>> >> David>section. Perhaps we could drop dwo_ids and use section
> references to
> > >>>> >> David>refer to types & rely on the linker to keep those
> referenced sections
> > >>>> >> David>alive - though section references are longer than
> CU-relative
> > >>>> >> David>references. (but we need the extra length - because if the
> linker
> > >>>> >> David>deduplicates a type definition - one CU may be referencing
> a type very
> > >>>> >> David>far away, so the shorter reference might be inadequate) I
> don't think
> > >>>> >> David>the indirection through the type hash is /super/
> significant to the
> > >>>> >> David>cost - I think it's more in the duplication of many DIEs
> especially
> > >>>> >> David>for function definitions (since the type unit sig8 system
> only
> > >>>> >> David>provides a way to reference the type - not its member
> functions, their
> > >>>> >> David>parameters, etc - so all those DIEs get duplicated in any
> CU that
> > >>>> >> David>needs to provide a definition of a member function). We
> could
> > >>>> >> David>prototype cross-unit DIE references to lower the cost of
> that
> > >>>> >> David>duplication, though rumor has it that constructor based
> type homing
> > >>>> >> David>might provide enough value to obviate the need for type
> units (or at
> > >>>> >> David>least make the overhead not worthwhile - so revisiting the
> overhead to
> > >>>> >> David>reduce it might make it worthwhile again... ).
> > >>>> >>
> > >>>> >> David>Probably wouldn't be super hard to use LLVM's existing
> cross-unit DIE
> > >>>> >> David>Referencing machinery (implemented for LTO) to refer
> directly to DIEs
> > >>>> >> David>in a type unit without using the signature system... -
> hmm, that'd
> > >>>> >> David>only work if your type unit DIEs were identical? /maybe/ ?
> Not sure
> > >>>> >> David>how that'd work if you wanted to refer into a type unit,
> but the type
> > >>>> >> David>unit got deduplicated. Might be able to rely on the linker
> to preserve
> > >>>> >> David>every unique copy of the type unit that's referenced if we
> phrase
> > >>>> >> David>things carefully - so if your compiler does produce
> exactly identical
> > >>>> >> David>type units they get deduplicated and sec_refs refer to the
> uniquely
> > >>>> >> David>preserved copy - but otherwise it preserves as many
> distinct copies as
> > >>>> >> David>needed. (I don't know enough about how that works to be
> sure - but I
> > >>>> >> David>know that these linkonce/inline function deduplication
> does seem to
> > >>>> >> David>cause the DWARF to refer to the singular function if that
> function is
> > >>>> >> David>identical, and if it isn't, then you get 0 - so there's
> /something/ in
> > >>>> >> David>the linker that can adjust for deduplicating identical
> duplicates... )
> > >>>> >>
> > >>>> >> Probably I was a bit unclear: the above idea is not for types
> > >>>> >> (placed in COMDAT sections) deduplicated by the linker.
> > >>>>
> > >>>> >Yeah, I was just wondering whether it could be useful for that too.
> > >>>>
> > >>>> >> This idea goes in another direction than fragmenting dwarf
> > >>>> >> using elf sections&tricks. It seems to me that the cost of
> fragmenting is too high.
> > >>>>
> > >>>> >I tend to agree - but I'm sort of leaning towards trying to use
> object
> > >>>> >features as much as possible, then implementing just enough custom
> > >>>> >handling in the linker to recoup overhead, etc. (eg: add some kind
> of
> > >>>> >small header/brief description that makes it easy for the linker to
> > >>>> >slice-and-dice - but hopefully a domain-specific such header can
> be a
> > >>>> >bit more compact than the fully general ELF form)
> > >>>>
> > >>>> I think this indeed should be implemented and evaluated.
> > >>>> So that various approaches could be compared.
> > >>>>
> > >>>> >> It is not only the sizes of structures describing fragments but
> also the complexity
> > >>>> >> of tools that should be taught to work with fragmented DWARF.
> > >>>> >> (f.e. llvm-dwarfdump applied to object file should be able to
> read fragmented DWARF,
> > >>>> >> but applied to linked executable it should work with
> non-fragmented DWARF).
> > >>>> >> That idea is for the tool which works the same way as dsymutil
> ODR.
> > >>>> >>
> > >>>> >> I will shortly describe the idea of making DWARF be easier
> processed by dsymutil/DWARFLinker:
> > >>>> >>
> > >>>> >> The idea is to have only one "type table" per object
> file(special section .debug_types_table).
> > >>>> >> This "type table" would contain all types.
> > >>>> >> There could be a special type of reference - type_offset - that
> offset points into the type table.
> > >>>> >> Basic types could always be placed into the start of "type
> table" thus, offsets to basic types
> > >>>> >> most often would be 1 byte. There also would be a special kind
> of reference - reference inside the type.
> > >>>> >> Type units sig8 system - would not be used to reference types.
> > >>>> >>
> > >>>> >> Types deduplication is assumed to be done, not by linker
> mechanism for COMDAT,
> > >>>> >> but by a tool like dsymutil. This tool would create resulting
> .debug_types_table by putting there
> > >>>> >> types from source .debug_types_table-s. Only one copy of the
> type would be placed into the
> > >>>> >> resulting table. All references pointing to the deleted copy
> would be corrected to point
> > >>>> >> to the single copy inside "type table". (that is how dsymutil
> works currently)
> > >>>>
> > >>>> >^ that's the step that's probably a bit expensive for a general-use
> > >>>> >tool - it implies parsing all the DWARF to find those references
> and
> > >>>> >rewrite them, I think. For a high-performance solution that could
> be
> > >>>> >run by the linker I think it'd be necessary to have a solution that
> > >>>> >doesn't involve parsing all the DIEs.
> > >>>>
> > >>>> According to the current dsymutil processing,
> > >>>> exactly this process is not the most time-consuming.
> > >>>> That could be done relatively fast.
> > >>
> > >>>Fair enough - though I'd still imagine any solution that involves
> > >>>parsing all the DIEs still wouldn't be fast enough (maybe an order of
> > >>>magnitude faster than the current solution even - but that's stuill,
> > >>>what, 6 or 7x slower than linking without the feature?) for most users
> > >>>to consider it a good trade-off.
> > >>
> > >>It seems to me that even the current 6x-7x slowdown could be useful.
> > >>Users who already use dsymutil or llvm-dwp(assuming DWARFLinker
> > >>would be taught to work with a split dwarf) tools spend this time and,
> > >>in some scenarios, waste disk space by inter-mediate files.
> > >>Thus if they would use this LLD feature in its current state
> > >>- they would still receive benefits.
> > >>
> > >>Speaking of performance results - LLD is a multi-thread linker;
> > >>it handles sections in parallel. DWARFLinker generates DWARF using
> > >>AsmPrinter which is a stream - so it could make resulting DWARF only
> > >>continuously. It is not surprising that the parallel solution works
> faster.
> > >>Making DWARFLinker truly multi-threaded would probably allow us
> > >>to make slowdown to be at 2x-4x range.
> > >
> > >If it is 6x-7x slowdown (which may be optimized to 2x-4x), I wonder
> > >whether it is a good trade-off keeping it as an in-linker pass, or
> > >rather we should just use another utility compressing the output
> separately.
> > >
> > >If the slowdown is such a pain, I might not consider --gc-debuginfo a
> > >readily usable feature like --gdb-index or future --debug-names (DWARF
> > >v5 accelator table - I have a plan to add it but I am always distracted
> > >by other priorities at hand). Considering that this breaks GNU linkers,
> > >I will add the following lines to the build system
> > >
> > >   if LINKER_IS_LLD && ENABLE_GC_DEBUGINFO
> > >     add -Wl,--gc-debuginfo
> > >
> > >I don't think this is more complex than:
> > >
> > >   if ENABLE_GC_DEBUGINFO
> > >     set linker to a wrapper which optimizes the output like dwz
> > >
> > >We probably should add another -f option for specifying the linker path,
> > >like -fld-path=
> https://lists.llvm.org/pipermail/cfe-dev/2020-June/065710.html
> >
> > gc-debuginfo could be done not from linker but as a standalone
> > tool(like dsymutil,llvm-dwp), as you said. The reasons why it
> > was suggested to do from the linker:
> >
> > 1. Linker already has liveness information built and object files loaded.
> >    Thus, it would be the fastest implementation if called from the
> linker.
> >    Otherwise, there should be created and written address map while
> >    linking, there should be generated inter-mediate debug-info files.
> >    And then, the separate tool would read that map and load object files
> >    or inter-mediate debug-info files again. So processing time
> >    would become even longer.
>
> I think the suggestion would be to link as normal, then process to
> optimize (like dwz - I believe it does something like this too). It
> wouldn't need a map from the linker - it could use the existing
> tombstone values in the linked DWARF to determine what to drop. Yes,
> the processing time would be longer, for sure. (I think Fangrui's
> suggestion there is "if you're willing to take a (let's say optimized)
> 2x or more increase to link time, maybe (let's say doing the
> intermediate step adds some fairly significant chunk of overhead) 3x
> wouldn't make too much of a difference?)
>
> > 2. Linker already processes debug info: error reporting,
>
> It's important that it's only consulted in the error path - if the
> link is failing anyway, taking a little longer to fail isn't likely to
> significantly hurt the user experience (compared to the time it takes
> a human to read the message, process it/think about what it means and
> come up with a theory about how to address it). Compared to getting in
> the way of the automated path to a working executable.
>
> > --gdb-index,
> >     upcoming --debug-names.
>
> Importantly, these can be produced without parsing a lot of DWARF if
> the input contains gnu_pubnames, or debug_names (yeah, currently
> gdb_index still is quite costly (but I think more like 10s of percent
> in link time, not multiple 100s of percent) in memory and link time
> even when it's only parsing gnu_pubnames.
>
> > From the design point of view, it would be
> >     good to have a separate module - DWARFLinker - which implements
> >     all that functionality. So that there would not be additional
> separate
> >     specific linker implementation of them. Instead, already existed
> >     implementation would be called from the linker. i.e. Depending on the
> >     tasks, the linker would call either DWARFLinker.generate-gdb-index(),
> >     DWARFLinker.generate-debug-names(), DWARFLinker.gc-debuginfo().
> >
> > The idea behind gc-debuginfo was not to slowdown the linking process for
> everybody.
> > But to allow generation optimized debug-info for those who need it.
> > That is the same idea as LTO. LTO slowdowns usual compilation
> significantly,
> > but it creates a highly optimized code.
>
> Yep - I think the main difference there is going to be the size of the
> user base compared to the complexity (certainly LTO adds complexity to
> the linker - though without much alternative (well, alternative would
> be build systems being aware of this - sort of like Fangrui's
> suggestion, LTO could be a separate tool that merges LLVM bitcode
> files, then creates real object files that go to the actual native
> linker) to gain the desired performance).
>
> Fangrui: What's your assessment of the complexity of adding this
> functionality to lld? Are you concerned it'll be an ongoing
> maintenance burden on other work in lld? If not, I'd be inclined
> to/lean towards accepting this & having some room for Alexey to
> improve the performance for his own users needs (& hopefully that'll
> improve DWARFLinker functionality/performance as well), see if anyone
> else wants this link time tradeoff.
>
> - Dave
>
> >
> > Thank you, Alexey.
> >
> > >>> Anyway, I think the dsymutil approach is still valuable, and it
> > >>> would be useful to optimize it.
> > >>> Do you think it would be useful to make dsymutil/DWARFLinker truly
> multi-thread?
> > >>> (To make dsymutil/DWARFLinker able to process each object file in a
> separate thread)
> > >
> > >>Perhaps - that I'd probably leave up to the folks who are more
> > >>invested in dsymutil (Adrian Prantl et al). Maybe one day we'll get it
> > >>integrated into llvm-dwp and then I'll be interested in getting as
> > >>much performance out of it as lld - so multithreading and things would
> > >>be on the books.
> > >
> > >I think improving dsymutil is a valuable thing.
> > >Though there are several directions which might be considered
> > >to make it more robust:
> > >
> > >1. support of latest DWARF - DWARF5/DWARF64...
> > >2. implement multi-threaded execution.
> > >3. support of split DWARF.
> > >4. implement dsymutil for non-darwin platform.
> > >
> > >All of this is a massive piece of work.
> > >Our original investment was to solve two problems:
> > >
> > >1. Overlapped address ranges, which is currently close to being solved.
> Thank you for helping with that!
> > >
> > >2. Size of debug info. That still becomes an issue, but we are unsure
> whether we are ready to
> > >   invest in solving all the above 1-4 problems and how much community
> interested in it.
> > >
> > >Thank you, Alexey.
> > >
> > >>> >One way to do that would be to have a CU-local type indirection
> table.
> > >>> >DIEs reference local type numbers (like local address/string
> numbers -
> > >>> >addrx/strx/rnglistx) and that table contains either sig8 (no linker
> > >>> >fixups required) or the local type offsets you describe - the linker
> > >>> >would then only need to read this type number indirection table and
> > >>> >rewrite them to the final type numbers.
> > >>>
> > >>> Yes, that could be additionally done if this process would be
> time-consuming.
> > >>>
> > >>> David, thank you for all your comments and explanations. They are
> extremely helpful.
> > >
> > >>Sure thing - really appreciate your patience with all this - it's... a
> > >>lot of moving parts.
> > >
> > >>- Dave
> > >
> > >>
> > >> Thank you, Alexey.
> > >>
> > >> > sig8 hash-id would be used to compare types and to deduplicate them.
> > >> > It would speed up the current dsymutil context analysis.
> > >> > Types having the same hash-id could be deduplicated.
> > >> > This would allow deduplicating a more number of types than current
> dsymutil.
> > >> > Incomplete type definitions having a similar set of members are not
> deduplicated by dsymutil currently.
> > >> > In this case they would have the same hash-id.
> > >> >
> > >> > This "type table" would take less space than current "type units"
> and current ODR solution.
> > >> >
> > >> > Above is just an idea on how to help DWARF-aware linker(based on
> idea removing obsolete debug info)
> > >> > to work faster(if that is interesting).
> > >> >
> > >> > Alexey.
> > >> >
> > >> > > From: llvm-dev <llvm-dev-bounces at lists.llvm.org> On Behalf Of
> James Henderson via llvm-dev
> > >> > > Sent: Wednesday, June 3, 2020 3:48 AM
> > >> > > To: David Blaikie <dblaikie at gmail.com>
> > >> > > Cc: llvm-dev at lists.llvm.org
> > >> > > Subject: Re: [llvm-dev] [Debuginfo][DWARF][LLD] Remove obsolete
> debug info in lld.
> > >> > >
> > >> > >
> > >> > >
> > >> > > It makes me sad that the linker (via a library or otherwise) has
> to be "DWARF-aware" to be able to effectively handle --gc-sections,
> COMDATs, --icf etc for debug info, without leaving large blocks of data
> kicking around.
> > >> > >
> > >> > >
> > >> > >
> > >> > > The patching to -1 (or equivalent) is probably a good lightweight
> solution (though I'd love it if it could be done based on section type in
> the future rather than section name, but that's probably outside the realm
> of DWARF), as it requires only minimal understanding in the linker, but
> anything beyond that seems to be complicated logic that is mostly due to
> the structure of DWARF. Patching to -1 does feel a bit like a sticking
> plaster/band aid to patch over the issue rather than properly solving it
> too - there will still be debug data (potentially significant amounts in
> COMDAT-heavy objects) that the linker has to write and the debugger has to
> somehow know how to skip (even if it knows that -1 is special-case due to
> the standard being updated, it needs to get as far as the -1), which is all
> wasted effort.
> > >> > >
> > >> > >
> > >> > >
> > >> > > We've already seen from Alexey's prototyping, and from our own
> experiences with the Sony proprietary linker (which tried to rewrite
> .debug_line only) that deconstructing the DWARF so that it can be more
> optimally reassembled at link time is slow going, and will probably
> inevitably be however much effort is put into optimising it. For a start,
> given the current standards, it's impossible to know how to deconstruct it
> without having to parse vast amounts of DWARF, which is typically going to
> mean a lot more parsing work than the linker would normally have to deal
> with. Additionally, much of this parsing work is wasted effort, since it
> seems unlikely in many links that large amounts of the DWARF will be
> redundant. Having an option to opt-in doesn't help much there, since it
> just means the logic exists without most people using it, due to it not
> being good enough, or potentially they don't even know it exists.
> > >> > >
> > >> > >
> > >> > >
> > >> > > I don't have particularly concrete suggestions as to how to solve
> the structural problems with DWARF at this point. The only thing that seems
> obvious to me is a more "blessed" approach to fragmentation of sections,
> similar to what I tried with my prototype mentioned earlier in the thread,
> although we'd need to figure out the previously stated performance issues.
> Other ideas might tie into this, like somehow sharing the various table
> headers a bit like CIEs in .eh_frame that could be merged by the linker -
> each object could have separate table header sections, which are referenced
> by the individual .debug_* blocks, which in turn are one per function/data
> piece and easily discardable/merged by the linker.
> > >> > >
> > >> > >
> > >> > >
> > >> > > Just some thoughts.
> > >> > >
> > >> > >
> > >> > >
> > >> > > James
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Tue, 2 Jun 2020 at 19:24, David Blaikie via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> > >> > >
> > >> > > On Tue, May 19, 2020 at 7:17 AM Alexey Lapshin
> > >> > > <alapshin at accesssoftek.com> wrote:
> > >> > > >
> > >> > > > Hi David, please find my comments inside:
> > >> > > >
> > >> > > >
> > >> > > > >>>Broad question: Do you have any specific
> motivation/users/etc in implementing this (if you can speak about it)?
> > >> > > >
> > >> > > > >>> - it might help motivate the work, understand what
> tradeoffs might be suitable for you/your users, etc.
> > >> > > >
> > >> > > > >>There are two general requirements:
> > >> > > > >> 1) Remove (or clean) invalid debug info.
> > >> > > >
> > >> > > > >
> > >> > > > >Perhaps a simpler direct solution for your immediate needs
> might be a much narrower,
> > >> > > > >and more efficient linker-DWARF-awareness feature:
> > >> > > > >
> > >> > > > > With DWARFv5, rnglists present an opportunity for a DWARF
> linker to rewrite the ranges
> > >> > > > > without parsing the rest of the DWARF. /technically/ this
> isn't guaranteed - rnglist entries
> > >> > > > > can be referenced either directly, or by index. If all
> rnglists are referenced by index, then
> > >> > > > > a linker could parse only the debug_rnglists section and
> rewrite ranges to remove any
> > >> > > > > address ranges that refer to optimized-out code.
> > >> > > > >
> > >> > > > > This would only be correct for rnglists that had no direct
> references to them (that only were
> > >> > > > > referenced via the indexes) - but we could either implement
> it with that assumption, or could
> > >> > > > > add an LLVM extension attribute on the CU that would say "I
> promise I only referenced rnglists
> > >> > > > > via rnglistx forms/indexes). If this DWARF-aware linking
> would have to read the CU DIE (not
> > >> > > > > all the other DIEs) it /could/ also then rewrite high/low_pc
> if the CU wasn't using ranges...
> > >> > > > > but that wouldn't come up in the function-removal case,
> because then you'd have ranges anyway,
> > >> > > > > so no need for that.
> > >> > > > >
> > >> > > > > Such a DWARF-aware rnglist linking could also simplify
> rnglists, in cases where functions
> > >> > > > > ended up being laid out next to each other, the linker could
> coalesce their ranges together.
> > >> > > > >
> > >> > > > > I imagine this could be implemented with very little overhead
> to linking, especially compared
> > >> > > > > to the overhead of full DWARF-aware linking.
> > >> > > > >
> > >> > > > >Though none of this fixes Split DWARF, where the linker
> doesn't get a chance to see the
> > >> > > > > addresses being used - but if you only want/need the CU-level
> ranges to be correct, this
> > >> > > > > might be a viable fix, and quite efficient.
> > >> > > >
> > >> > > > Yes, we think about that alternative. This would resolve our
> problem of invalid debug info
> > >> > > > and would work much faster. Thus, if we would not have good
> results for D74169 then we
> > >> > > > will implement it. Do you think it could be useful to have this
> solution in upstream?
> > >> > >
> > >> > > A pure rnglist rewriting - I think it'd be OK to have in upstream
> -
> > >> > > again, cost/benefit/etc would have to be weighed. I'm not sure it
> > >> > > would save enough space to be particularly valuable beyond the
> > >> > > correctness issue - and it doesn't completely solve the
> correctness
> > >> > > issue for zero-address usage or low-address usage (because you
> could
> > >> > > still have overlapping subprograms inside a CU - so if you were
> > >> > > symbolizing you could use the correct rnglist to filter, but then
> go
> > >> > > look inside the CU only to find two subprograms that had that
> address
> > >> > > & not know which one was the correct one an which one was the
> > >> > > discarded one).
> > >> > >
> > >> > > rnglist rewriting might be easy enough to prototype - but depends
> what
> > >> > > you want to spend your time on, I know this whole issue has been a
> > >> > > huge investment of your time already - but maybe this recent
> > >> > > revitalization of the conversation around having an explicit
> value in
> > >> > > the linker might be sufficient to address everyone's needs...
> *fingers
> > >> > > crossed*)
> > >> > >
> > >> > >
> > >> > > > >> 2) Optimize the DWARF size.
> > >> > > >
> > >> > > >
> > >> > > > > Do your users care much about this? I imagine if they had
> significant DWARF size issues,
> > >> > > > > they'd have significant link time issues and the kind of cost
> to link time this feature has would
> > >> > > > > be prohibitive - but perhaps they're sharing linked binaries
> much more often than they're
> > >> > > > > actually performing linking.
> > >> > > >
> > >> > > > Yes, they do. They also have significant link-time issues.
> > >> > > > So current performance results of D74169 are not very
> acceptable.
> > >> > > > We hope to improve it.
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > >>The specifics which our users have:
> > >> > > > >>  - embedded platform which uses 0 as start of .text section.
> > >> > > > >>  - custom toolset which does not support all features
> yet(f.e. split dwarf).
> > >> > > > >>  - tolerant of the link-time increase.
> > >> > > > >>  - need a useful way to share debug builds.
> > >> > > >
> > >> > > >
> > >> > > > > Sharing two files (executable and dwp) is significantly less
> useful than sharing one file?
> > >> > > >
> > >> > > > Probably not significantly, but yes, it looks less useful
> comparing to D74169.
> > >> > > > Having only two files (executable and .dwp) looks significantly
> better than having executable and multiple .dwo files.
> > >> > > > Having only one file(executable) with minimal size looks better
> than the two files with a bigger size.
> > >> > > >
> > >> > > > clang compiled with -gsplitdwarf takes 0.9G for executable and
> 0.9G for .dwp.
> > >> > > > clang compiled with -gc-debuginfo takes only 0.76G for single
> executable.
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > >>For the first point: we have a problem "Overlapping address
> ranges starting from 0"(D59553).
> > >> > > >
> > >> > > > >>We use custom solution, but the general solution like D74169
> would be better here.
> > >> > > >
> > >> > > >
> > >> > > > > If CU ranges are the only ones that need fixing, then I think
> the above solution might be as
> > >> > > > > good/better - if more than CU ranges need fixing, then I
> think we might want to start talking about
> > >> > > > > how to fix DWARF itself (split and non-split) to signal
> certain addresses point to dead code with a
> > >> > > > > specific blessed value that linkers would need to implement -
> because with Split DWARF there's
> > >> > > > > no way to solve the non-CU addresses at the linker.
> > >> > > >
> > >> > > > I think the worthful solution for that signal value would be
> LowPC > HighPC.
> > >> > > > That does not require additional bits in DWARF.
> > >> > > > It would be natural to skip such address ranges since they
> explicitly marked as invalid.
> > >> > > > It could be implemented in a linker very easily. Probably, it
> would make sense to describe that
> > >> > > > usage in DWARF standard.
> > >> > > >
> > >> > > > As to the addresses which are not seen by the linker(since they
> are in .dwo files) - yes,
> > >> > > > they need to have another solution. Could you show an example
> of such a case, please?
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > >>>2. Support of type units.
> > >> > > >
> > >> > > > >>>
> > >> > > >
> > >> > > > >>>>  That could be implemented further.
> > >> > > >
> > >> > > > >>>Enabling type units increases object size to make it easier
> to deduplicate at link time by a DWARF-unaware
> > >> > > >
> > >> > > > >>>linker. With a DWARF aware linker it'd be generally
> desirable not to have to add that object size overhead to
> > >> > > >
> > >> > > > >>>get the linking improvements.
> > >> > > >
> > >> > > > >>
> > >> > > >
> > >> > > > >>But, DWARFLinker should adequately work with type units since
> they are already implemented.
> > >> > > >
> > >> > > >
> > >> > > > > Maybe - it'd be nice & all, but I don't think it's an
> outright necessity - if someone knows they're using
> > >> > > > > a DWARF-aware linker, they'd probably not use type units in
> their object files. It's possible someone
> > >> > > > > doesn't know for sure & maybe they have pre-canned debug
> object files from someone else, etc.
> > >> > > >
> > >> > > > I see.
> > >> > > >
> > >> > > > >>Another thing is that the idea behind type units has the
> potential to help Dwarf-aware linker to work faster.
> > >> > > >
> > >> > > > >>Currently, DWARFLinker analyzes context to understand whether
> types are the same or not.
> > >> > > >
> > >> > > >
> > >> > > > >When you say "analyzes context" what do you mean? Usually I'd
> take that to mean
> > >> > > > > "looks at things outside the type itself - like what
> namespace it's in, etc" - which, yes,
> > >> > > > > it should do that, but it doesn't seem very expensive to do.
> But I guess you actually
> > >> > > > > mean something about doing structural equivalence in some
> way, looking at things inside the type?
> > >> > > >
> > >> > > > I think it could be useful for both cases. Currently, dsymutil
> does only first thing
> > >> > > > (look at type name, namespace name, etc..) and does not do the
> second thing
> > >> > > > (doing structural equivalence). Analyzing type names is
> currently quite expensive
> > >> > > > (the only search in string pool takes ~10 sec from 70 sec of
> overall time).
> > >> > > > That is expensive because of many things should be done to work
> with strings:
> > >> > > > parse DWARF, search and resolve relocations, compute a hash for
> strings,
> > >> > > > put data into a string pool, create a fully qualified name(like
> namespace::function::name).
> > >> > > > It looks like it could be optimized and finally require less
> time, but it still would be a noticeable
> > >> > > > part of the overall time.
> > >> > > >
> > >> > > > If dsymutil starts to check for the structural equivalence,
> then the process would be even more slowly.
> > >> > > > So, If instead of comparing types structure, there would be
> checked single hash-id - then this process
> > >> > > > would also be faster.
> > >> > > >
> > >> > > > Thus I think using hash-id to compare types would allow to make
> current implementation faster and would
> > >> > > > allow handling incomplete types by DWARFLinker without massive
> performance degradation also.
> > >> > > >
> > >> > > > >> But the context is known when types are generated. So, no
> need to spent the time analyzing it.
> > >> > > >
> > >> > > > >> If types could be compared without analyzing context, then
> Dwarf-aware linker would work faster.
> > >> > > >
> > >> > > > >> That is just an idea(not for immediate implementation): If
> types would be stored in some "type table"
> > >> > > >
> > >> > > > >> (instead of COMDAT section group) and could be accessed
> through hash-id(like type units
> > >> > > >
> > >> > > > >> - then it would be the solution requiring fewer bits to
> store but allowing to compare types
> > >> > > >
> > >> > > > >> by hash-id(not analysing context).
> > >> > > > >> In this case, size increasing would be small. And processing
> time could be done faster.
> > >> > > > >>
> > >> > > > >> this is just an idea and could be discussed separately from
> the problem of integrating of D74169.
> > >> > > >
> > >> > > > >> >> 6. -flto=thin
> > >> > > >
> > >> > > > >> >>    That problem was described in this review
> https://reviews.llvm.org/D54747#1503720. It also exists in
> > >> > > >
> > >> > > > >> >> current DWARFLinker/dsymutil implementation. I think that
> problem should be discussed more: it could
> > >> > > >
> > >> > > > >> >> probably be fixed by avoiding generation of such
> incomplete declaration during thinlto,
> > >> > > >
> > >> > > > >> >> That would be costly to produce extra/redundant debug
> info in ThinLTO - actually ThinLTO could be doing
> > >> > > >
> > >> > > > >> >> more to reduce that redundancy early on (actually
> removing definitions from some llvm Modules if the type
> > >> > > >
> > >> > > > >> >> definition is known to exist in another Module, etc)
> > >> > > > >> >I don't know if it's a problem since that patch was
> reverted.
> > >> > > >
> > >> > > > >>
> > >> > > >
> > >> > > > >> Yes. That patch was reverted, but this patch(D74169) has the
> same problem.
> > >> > > >
> > >> > > > >> if D74169 would be applied and --gc-debuginfo used then
> structure type
> > >> > > > >> definition would be removed.
> > >> > > >
> > >> > > > >> DWARFLinker could handle that case - "removing definitions
> from some llvm Modules if the type
> > >> > > > >> definition is known to exist in another Module".
> > >> > > > >> i.e. DWARFLinker could replace the declaration with the
> definition.
> > >> > > >
> > >> > > > >> But that problem could be more easily resolved when debug
> info is generated(probably without
> > >> > > > >> significant increase of debug info size):
> > >> > > >
> > >> > > > >> Here we have:
> > >> > > >
> > >> > > > >> DW_TAG_compile_unit(0x0000000b) - compile unit containing
> concrete instance for function "f".
> > >> > > > >> DW_TAG_compile_unit(0x00000073) - compile unit containing
> abstract instance root for function "f".
> > >> > > > >> DW_TAG_compile_unit(0x000000c1) - compile unit containing
> function "f" definition.
> > >> > > >
> > >> > > > >> Code for function "f" was deleted. gc-debuginfo deletes
> compile unit DW_TAG_compile_unit(0x000000c1)
> > >> > > > >> containing "f" definition (since there is no corresponding
> code). But it has structure "Foo" definition
> > >> > > > >> DW_TAG_structure_type(0x0000011e) referenced from
> DW_TAG_compile_unit(0x00000073)
> > >> > > > >> by declaration DW_TAG_structure_type(0x000000ae). That
> declaration is exactly the case when definition
> > >> > > > >> was removed by thinlto and replaced with declaration.
> > >> > > >
> > >> > > > >> Would it cost too much if type definition would not be
> replaced with declaration for "abstract instance root"?
> > >> > > > >> The number of concrete instances is bigger than number of
> abstract instance roots.
> > >> > > > >> Probably, it would not be too costly to leave definition in
> abstract instance root?
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > >> Alternatively, Would it cost too much if type definition
> would not be replaced with declaration when
> > >> > > > >> declaration references type from not used function? (lto
> could understand that concrete function is not used).
> > >> > > >
> > >> > > >
> > >> > > > >I don't follow this example - could you provide a small
> concrete test case I could reproduce?
> > >> > > >
> > >> > > > I would provide a test case if necessary. But it looks like
> this issue is finally clear, and you already commented on that.
> > >> > > >
> > >> > > > > Oh, I guess this is happening perhaps because ThinLTO can't
> know for sure that a standalone
> > >> > > > > definition of 'f' won't be needed - so it produces one in
> case one of the inlining opportunities
> > >> > > > > doesn't end up inlining. Then it turns out all calls got
> inlined, so the external definition wasn't needed.
> > >> > > >
> > >> > > > > Oh, you're suggesting that these 3 CUs got emitted into one
> object file during LTO, but that DWARFLinker
> > >> > > > > drops a CU without any code in it - even though... So far as
> I know, in LTO, LLVM directly references
> > >> > > > > types across units if the CUs are all emitted in the same
> object file. (and if they weren't in the same
> > >> > > > > object file - then the abstract_origin couldn't be pointing
> cross-CU).
> > >> > > >
> > >> > > > > I guess some basic things to say:
> > >> > > >
> > >> > > > > With ThinLTO, the concrete/standalone function definition is
> emitted in case some call sites don't end up
> > >> > > > > being inlined. So we know it'll be emitted (but might not be
> needed by the actual linker)
> > >> > > > > ANy number of inline calls might exist - but we shouldn't put
> the type information into those, because
> > >> > > > > they aren't guaranteed to emit it (if the inline function
> gets optimized away, there would be nothing to
> > >> > > > > enforce the type being emitted) - and even if we forced the
> type information to be emitted into one
> > >> > > > > object file that has an inline copy of the function - there's
> no guarantee that object file will get linked in either.
> > >> > > >
> > >> > > > > So, no, I don't think there's much we can do to keep the size
> of object files down, while guaranteeing
> > >> > > > > the type information will be emitted with the usual linker
> semantics.
> > >> > > >
> > >> > > > Then dsymutil/DWARFLinker could be changed to handle
> that(though it would probably be not very efficient).
> > >> > > > If thinlto would understand that function is not used
> finally(and then must not contain referenced type definition),
> > >> > > > then this situation could be handled more effectively.
> > >> > > >
> > >> > > > Thank you, Alexey.
> > >> > > >
> > >> > > >>>
> > >> > > >>>
> > >> > > >>>
> > >> > > >>>
> > >> > > >>> _______________________________________________
> > >> > > >>> LLVM Developers mailing list
> > >> > > >>> llvm-dev at lists.llvm.org
> > >> > > >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > >> > > _______________________________________________
> > >> > > LLVM Developers mailing list
> > >> > > llvm-dev at lists.llvm.org
> > >> > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > >_______________________________________________
> > >LLVM Developers mailing list
> > >llvm-dev at lists.llvm.org
> > >https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200624/5dd81b60/attachment-0001.html>