[PATCH] D96035: [WIP][dsymutil][DWARFlinker] implement separate multi-thread processing for compile units.

Thu Feb 25 02:28:00 PST 2021

avl added a comment.

> Sounds like your proposal would require that too - or require reloading the CUs twice? I think either strategy (keeping them loaded, or reloading them) could be used for either of our proposed directions (creating a separate unit, or using the existing units)?

No. My solution does not require to keep all CUs in memory or to reload them twice(for the purpose of ODR types deduplication).

It loads CU, analyzes types, creates list of type references, removes types. Emit DWARF for that CU. unload CU.

It does not need to load that CU again or keep input DWARF of that CU in the memory.
It only needs to fix-up remembered type references in generated DWARF after all CUs are processed and artificial CU is built.
It keeps all types in memory but that requires less space than all CUs.

> "types moved into the artificial CU" means what exactly? Copying them type DIE tree into some intermediate, in-memory representation?

yes.

> Ah, yes, though keeping them in memory may be expensive - it might be cheaper to rewalk the units/know the offsets to the DIEs and reload them in some way.

I think it would not be cheaper to load CU from the disk again. But we can do experiment and select the more effective solution. i.e. solution with artificial CU allows us to have a choice.

If we would implement solution which load all CUs in memory or reload them from disk - then we would not have a choice.

> Yes, if we're merging types it becomes somewhat more complicated - again, could do a two pass (without necessarily keeping all CUs in memory) - go quickly skim the CUs, for each type, record the type name and member list - then merge those lists, keeping the type record for the lowest indexed CU, and a merged member list that mentions which offset/CU each member comes from. This is essentially the same data you'd have to keep in memory (either fully in memory, or some kind of skeleton data that refers to the CUs/input file itself, so it can be reparsed as needed) for the "emit a separate unit full of types at the end" approach.
>  And in any case, you couldn't emit the units in parallel in the first pass, because you wouldn't know which offsets to write them to, right? (because the size of units will be changing during this process)

Not exactly right. This prototype uses exactly the scheme with one pass(which loads/parse/handle/unload CU only once). It processes units in parallel, generates resulting DWARF and the list of references which should be patched. After all CUs are processed and final sizes are known. It has additional pass which writes correct offsets. That additional pass is _much_ cheaper than load/parse/handle/unload CUs again.

> So I'm not sure where the parallelism comes into your scheme (& even in my scheme it'd be a bit non-trivial - I guess you'd have to record all the DIE references in each unit that might become cross-unit references (so you know you'll have to change their encoded size) and all the types (so you know whether that'll get bigger (if members are merged in) or smaller (if the type is removed in favor of referencing a type in another unit) - not sure there's a huge difference in performance/complexity between the two, perhaps.

right. That is the scheme(remember cross-unit references and patch them later) is used by this patch(except that it does not do types merging and does not have separate CU keeping types). This scheme allows to process CU separately, in parallel. It gives us more than 2x performance improvement. It also might be improved more(theoretically, If threads inter-dependency would be decreased).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96035/new/

https://reviews.llvm.org/D96035