[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.

Thu Oct 29 00:50:23 PDT 2020

On 28.10.2020 20:38, David Blaikie wrote:
>
>
> On Wed, Oct 28, 2020 at 6:01 AM Alexey Lapshin <avl.lapshin at gmail.com 
> <mailto:avl.lapshin at gmail.com>> wrote:
>
>
>     On 28.10.2020 01:49, David Blaikie wrote:
>>
>>
>>>
>>>>                 Without loading all CU into the memory it would
>>>>                 require two passes solution. First to analyze
>>>>                 which part of DWARF relates to live code and then
>>>>                 second pass to generate the result.
>>>>
>>>>             Not sure it'd require any more second pass than a
>>>>             "fixup" pass, which it sounds like you're saying it
>>>>             already has?
>>>
>>>             It looks like it would need an additional pass to
>>>             process inter-CU references(existed in incoming file) if
>>>             we do not want to load all CUs into memory.
>>>
>>>         Usually inter-CU references aren't used, except in LTO - and
>>>         in LTO all the DWARF deduplication and function discarding
>>>         is already done by the IR linker anyway. (ThinLTO is a bit
>>>         different, but really we'd be better off teaching it the
>>>         extra tricks anyway (some can't be fixed in ThinLTO - like
>>>         emitting a "Home" definition of an inline function, only to
>>>         find out other ThinLTO backend/shards managed to optimize
>>>         away all uses of the function... so some cleanup may be
>>>         useful there)). It might be possible to do a more
>>>         dynamic/rolling cache - keep only the CUs with unresolved
>>>         cross-CU references alive and only keep them alive until
>>>         their cross-CU references are found/marked alive. This
>>>         should make things no worse than the traditional dsymutil
>>>         case - since cross-CU references are only
>>>         effective/generally used within a single object file (it's
>>>         possible to create relocations for them into other files -
>>>         but I know LLVM doesn't currently do this and I don't think
>>>         GCC does it) with multiple CUs anyway - so at most you'd
>>>         keep all the CUs from a single original input file alive
>>>         together.
>>         But, since it is a DWARF documented case the tool should be
>>         ready for such case(when inter-CU
>>         references are heavily used).
>>
>>
>>     Sure - but by implementing a CU liveness window like that
>>     (keeping CUs live only so long as they need to be rather than an
>>     all-or-nothing approach) only especially quirky inputs would hit
>>     the worst case while the more normal inputs could perform better.
>
>     It is not clear what should be put in such CU liveness window. If
>     CU100 references CU1 - how could we know that we need to put CU1
>     into CU liveness window before we processed CU100?
>
> Fair point, not just forward references to worry about but backward 
> references too. I wonder how much savings there is in the liveness 
> analysis compared to "keep one copy of everything, no matter whether 
> it's live or not", then it can be a pure forward progress situation. 
> (with the quirk that you might emit a declaration for an entity once, 
> then a definition for it later - alternatively if a declaration is 
> seen it could be skipped under the assumption that a definition will 
> follow (& use a forward ref fixup) - and if none is found, splat some 
> stub declarations into a trailing CU at the end)
That should probably be measured, but I think we would loss most of size 
reduction
(since we would start keep unreferenced data which is currently removed).
Which would lead to slowdown performance and bigger disk space usage.


>
>>         Moreover, llvm-dwarfutil would be the tool producing
>>         exactly such situation. The resulting file(produced by
>>         llvm-dwarfutil) would contain a lot of
>>         inter-CU references. Probably, there is no practical reasons
>>         to apply llvm-dwarfutil to the same
>>         file twice but it would be a good test for the tool.
>>
>>
>>     It'd be a good stress test, but not necessarily something that
>>     would need to perform the best because it wouldn't be a common
>>     use case.
>
>     I agree that we should not slow down the DWARFLinker in common
>     cases only because we need to support the worst cases.
>     But we also need to implement a solution which works in some
>     acceptable manner for the worst case.
>
> I think that depends on "acceptable" - correct, yes. Practical to run 
> in reasonable time/memory? Not necessarily, in my opinion.
>
>     The current solution - loading everything in memory - makes it
>     hard to use in a non-dsymutil scenario(llvm-dwarfutil).
>
> I agree it's worth exploring the non-dsymutil scenario, as you are - 
> I'm just saying we don't necessarily need to support high usability 
> (fast/low memory usage/etc) llvm-dwarfutil on an already dwarfutil'd 
> binary (but as you've pointed out, the "window" is unknowable because 
> of backward references, so this whole subthread is perhaps irrelevant).
>
>     There could be several things which could be used to decide
>     whether we need to go on a light or heavy path:
>
>     1. If the input contains only a single CU we do not need to unload
>     it from memory. Thus - we would not need to do an extra DWARF
>     loading pass.
>     2. If abbreviations from the whole input file do not contain
>     inter-CU references then while doing liveness analysis,  we do not
>     need to wait until other CUs are processed.
>
> (2) Yeah, that /may/ be a good idea, cheap to test, etc. Though I'd 
> still wonder if a more general implementation strategy could be found 
> that would make it easier to get a sliding scale of efficiency 
> depending on how much inter-CU references where were, not a "if there 
> are none it's good, if there are any it's bad or otherwise very 
> different to implement".
At the current point, I do not see how that could be done.
One possibility is preliminary mark CU by IsReferenced flag.
Then we could delay cloning for such CU(either by putting into
CU liveness window/either by unloading).
Not referenced CU could be cloned immediately. Such a solution would be
more scalable and work well in cases when only a few inter-CU references
exist. Though it requires changes in DWARF format.


>
>     Then that scheme would be used for worst cases:
>
>     1. for (CU : CU1...CU100) {
>          load CU.
>          analyse CU.
>          unload CU.
>        }
>     2. for (CU : CU1...CU100) {
>          load CU.
>          clone CU.
>          unload CU.
>        }
>     3. fixup forward references.
>
>     and that scheme for light cases:
>
>     1. for (CU : CU1...CU100) {
>          load CU.
>          analyse CU.
>          clone CU.
>          unload CU.
>        }
>     2. fixup forward references.
>
>>         Generally, I think we should not assume that inter-CU
>>         references would be used in a limited way.
>>
>>         Anyway, if this scheme:
>>
>>         1. analyse all CUs.
>>         2. clone all CUs.
>>
>>         would work slow then we would need to continue with one-pass
>>         solution and not support complex closely coupled inputs.
>>
>>
>>     yeah, certainly seeing the data/experiments will be interesting,
>>     if you end up implementing some different strategies, etc.
>>
>>     I guess one possibility for parallel generation could be
>>     something more like Microsoft's approach with a central debug
>>     info server that compilers communicate with - not that exact
>>     model, I mean, but if you've got parallel threads generating
>>     reduced DWARF into separate object files - they could communicate
>>     with a single thread responsible for type emission - the type
>>     emitter would be given types from the separate threads and
>>     compute their size, queue them up to be streamed out to the type
>>     CU (& keep the source CU alive until that work was done) - such a
>>     central type emitter could quickly determine the size of the type
>>     to be emitted and compute future type offsets (eg: if 5 types
>>     were in the queue, it could've figured out the offset of those
>>     types already) to answer type offset queries quickly and unblock
>>     the parallel threads to continue emitting their CUs containing
>>     type references.
>
>     yes. Thank you. Would think about it.
>
>     Alexey.
>
>>
>>     - Dave
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201029/654145ca/attachment-0001.html>