[llvm-dev] [Proposal][Debuginfo] dsymutil-like tool for ELF.
Alexey Lapshin via llvm-dev
llvm-dev at lists.llvm.org
Thu Oct 29 00:50:23 PDT 2020
On 28.10.2020 20:38, David Blaikie wrote:
>
>
> On Wed, Oct 28, 2020 at 6:01 AM Alexey Lapshin <avl.lapshin at gmail.com
> <mailto:avl.lapshin at gmail.com>> wrote:
>
>
> On 28.10.2020 01:49, David Blaikie wrote:
>>
>>
>>>
>>>> Without loading all CU into the memory it would
>>>> require two passes solution. First to analyze
>>>> which part of DWARF relates to live code and then
>>>> second pass to generate the result.
>>>>
>>>> Not sure it'd require any more second pass than a
>>>> "fixup" pass, which it sounds like you're saying it
>>>> already has?
>>>
>>> It looks like it would need an additional pass to
>>> process inter-CU references(existed in incoming file) if
>>> we do not want to load all CUs into memory.
>>>
>>> Usually inter-CU references aren't used, except in LTO - and
>>> in LTO all the DWARF deduplication and function discarding
>>> is already done by the IR linker anyway. (ThinLTO is a bit
>>> different, but really we'd be better off teaching it the
>>> extra tricks anyway (some can't be fixed in ThinLTO - like
>>> emitting a "Home" definition of an inline function, only to
>>> find out other ThinLTO backend/shards managed to optimize
>>> away all uses of the function... so some cleanup may be
>>> useful there)). It might be possible to do a more
>>> dynamic/rolling cache - keep only the CUs with unresolved
>>> cross-CU references alive and only keep them alive until
>>> their cross-CU references are found/marked alive. This
>>> should make things no worse than the traditional dsymutil
>>> case - since cross-CU references are only
>>> effective/generally used within a single object file (it's
>>> possible to create relocations for them into other files -
>>> but I know LLVM doesn't currently do this and I don't think
>>> GCC does it) with multiple CUs anyway - so at most you'd
>>> keep all the CUs from a single original input file alive
>>> together.
>> But, since it is a DWARF documented case the tool should be
>> ready for such case(when inter-CU
>> references are heavily used).
>>
>>
>> Sure - but by implementing a CU liveness window like that
>> (keeping CUs live only so long as they need to be rather than an
>> all-or-nothing approach) only especially quirky inputs would hit
>> the worst case while the more normal inputs could perform better.
>
> It is not clear what should be put in such CU liveness window. If
> CU100 references CU1 - how could we know that we need to put CU1
> into CU liveness window before we processed CU100?
>
> Fair point, not just forward references to worry about but backward
> references too. I wonder how much savings there is in the liveness
> analysis compared to "keep one copy of everything, no matter whether
> it's live or not", then it can be a pure forward progress situation.
> (with the quirk that you might emit a declaration for an entity once,
> then a definition for it later - alternatively if a declaration is
> seen it could be skipped under the assumption that a definition will
> follow (& use a forward ref fixup) - and if none is found, splat some
> stub declarations into a trailing CU at the end)
That should probably be measured, but I think we would loss most of size
reduction
(since we would start keep unreferenced data which is currently removed).
Which would lead to slowdown performance and bigger disk space usage.
>
>> Moreover, llvm-dwarfutil would be the tool producing
>> exactly such situation. The resulting file(produced by
>> llvm-dwarfutil) would contain a lot of
>> inter-CU references. Probably, there is no practical reasons
>> to apply llvm-dwarfutil to the same
>> file twice but it would be a good test for the tool.
>>
>>
>> It'd be a good stress test, but not necessarily something that
>> would need to perform the best because it wouldn't be a common
>> use case.
>
> I agree that we should not slow down the DWARFLinker in common
> cases only because we need to support the worst cases.
> But we also need to implement a solution which works in some
> acceptable manner for the worst case.
>
> I think that depends on "acceptable" - correct, yes. Practical to run
> in reasonable time/memory? Not necessarily, in my opinion.
>
> The current solution - loading everything in memory - makes it
> hard to use in a non-dsymutil scenario(llvm-dwarfutil).
>
> I agree it's worth exploring the non-dsymutil scenario, as you are -
> I'm just saying we don't necessarily need to support high usability
> (fast/low memory usage/etc) llvm-dwarfutil on an already dwarfutil'd
> binary (but as you've pointed out, the "window" is unknowable because
> of backward references, so this whole subthread is perhaps irrelevant).
>
> There could be several things which could be used to decide
> whether we need to go on a light or heavy path:
>
> 1. If the input contains only a single CU we do not need to unload
> it from memory. Thus - we would not need to do an extra DWARF
> loading pass.
> 2. If abbreviations from the whole input file do not contain
> inter-CU references then while doing liveness analysis, we do not
> need to wait until other CUs are processed.
>
> (2) Yeah, that /may/ be a good idea, cheap to test, etc. Though I'd
> still wonder if a more general implementation strategy could be found
> that would make it easier to get a sliding scale of efficiency
> depending on how much inter-CU references where were, not a "if there
> are none it's good, if there are any it's bad or otherwise very
> different to implement".
At the current point, I do not see how that could be done.
One possibility is preliminary mark CU by IsReferenced flag.
Then we could delay cloning for such CU(either by putting into
CU liveness window/either by unloading).
Not referenced CU could be cloned immediately. Such a solution would be
more scalable and work well in cases when only a few inter-CU references
exist. Though it requires changes in DWARF format.
>
> Then that scheme would be used for worst cases:
>
> 1. for (CU : CU1...CU100) {
> load CU.
> analyse CU.
> unload CU.
> }
> 2. for (CU : CU1...CU100) {
> load CU.
> clone CU.
> unload CU.
> }
> 3. fixup forward references.
>
> and that scheme for light cases:
>
> 1. for (CU : CU1...CU100) {
> load CU.
> analyse CU.
> clone CU.
> unload CU.
> }
> 2. fixup forward references.
>
>> Generally, I think we should not assume that inter-CU
>> references would be used in a limited way.
>>
>> Anyway, if this scheme:
>>
>> 1. analyse all CUs.
>> 2. clone all CUs.
>>
>> would work slow then we would need to continue with one-pass
>> solution and not support complex closely coupled inputs.
>>
>>
>> yeah, certainly seeing the data/experiments will be interesting,
>> if you end up implementing some different strategies, etc.
>>
>> I guess one possibility for parallel generation could be
>> something more like Microsoft's approach with a central debug
>> info server that compilers communicate with - not that exact
>> model, I mean, but if you've got parallel threads generating
>> reduced DWARF into separate object files - they could communicate
>> with a single thread responsible for type emission - the type
>> emitter would be given types from the separate threads and
>> compute their size, queue them up to be streamed out to the type
>> CU (& keep the source CU alive until that work was done) - such a
>> central type emitter could quickly determine the size of the type
>> to be emitted and compute future type offsets (eg: if 5 types
>> were in the queue, it could've figured out the offset of those
>> types already) to answer type offset queries quickly and unblock
>> the parallel threads to continue emitting their CUs containing
>> type references.
>
> yes. Thank you. Would think about it.
>
> Alexey.
>
>>
>> - Dave
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201029/654145ca/attachment-0001.html>
More information about the llvm-dev
mailing list