[PATCH] D96035: [WIP][dsymutil][DWARFlinker] implement separate multi-thread processing for compile units.

Fri Mar 5 12:27:09 PST 2021

dblaikie added a comment.

In D96035#2587154 <https://reviews.llvm.org/D96035#2587154>, @avl wrote:

>> Sounds like your proposal would require that too - or require reloading the CUs twice? I think either strategy (keeping them loaded, or reloading them) could be used for either of our proposed directions (creating a separate unit, or using the existing units)?
>
> No. My solution does not require to keep all CUs in memory or to reload them twice(for the purpose of ODR types deduplication).
>
> It loads CU, analyzes types, creates list of type references, removes types. Emit DWARF for that CU. unload CU.
>
> It does not need to load that CU again or keep input DWARF of that CU in the memory.
> It only needs to fix-up remembered type references in generated DWARF after all CUs are processed and artificial CU is built.
> It keeps all types in memory but that requires less space than all CUs.

Yeah, having to keep the types in memory is the bit I'm getting at - but yes, it's not all CUs. If they're being unloaded/reloaded though, it might be less clear whether it's so costly to preserve the existing output behavior. Though I don't personally have a problem with creating a synthetic/arbitrary CU to put all the types in either. But other folks (Apple folks with a vested interest in the dsymutil behavior) might disagree.

>> "types moved into the artificial CU" means what exactly? Copying them type DIE tree into some intermediate, in-memory representation?
>
> yes.
>
>> Ah, yes, though keeping them in memory may be expensive - it might be cheaper to rewalk the units/know the offsets to the DIEs and reload them in some way.
>
> I think it would not be cheaper to load CU from the disk again. But we can do experiment and select the more effective solution. i.e. solution with artificial CU allows us to have a choice.
>
> If we would implement solution which load all CUs in memory or reload them from disk - then we would not have a choice.
>
>> Yes, if we're merging types it becomes somewhat more complicated - again, could do a two pass (without necessarily keeping all CUs in memory) - go quickly skim the CUs, for each type, record the type name and member list - then merge those lists, keeping the type record for the lowest indexed CU, and a merged member list that mentions which offset/CU each member comes from. This is essentially the same data you'd have to keep in memory (either fully in memory, or some kind of skeleton data that refers to the CUs/input file itself, so it can be reparsed as needed) for the "emit a separate unit full of types at the end" approach.
>>  And in any case, you couldn't emit the units in parallel in the first pass, because you wouldn't know which offsets to write them to, right? (because the size of units will be changing during this process)
>
> Not exactly right. This prototype uses exactly the scheme with one pass(which loads/parse/handle/unload CU only once). It processes units in parallel, generates resulting DWARF

By "generating resulting DWARF" I guess that has to be buffered in memory? (how could two CUs be written to the final output file at the same time? You wouldn't know where to write to because you wouldn't know how big the previous unit would be if you haven't finished processing it)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96035/new/

https://reviews.llvm.org/D96035