[PATCH] D96035: [WIP][dsymutil][DWARFlinker] implement separate multi-thread processing for compile units.

Wed Feb 24 16:05:01 PST 2021

dblaikie added a comment.

In D96035#2584686 <https://reviews.llvm.org/D96035#2584686>, @avl wrote:

>> Comparing your proposal - avoiding nondeterminism by sorting the types by name in the new type-holding CU, we could do something similar, but instead sorting the CUs a type appears in to pick which of those existing locations to be the canonical home. (or, indeed, could sort by the original linear visitation order)
>> eg: multithreaded walk over each CU and find each type (this applies to both your proposal and what I'm suggesting here, I think) - then, rather than sorting by name and putting them in a new CU, sort by the "index" of the CU the type appeared in (where the index is the order the CUs would've been visited in in the current linear algorithm) then place/preserve the canonical type in the CU with the lowest index where the type appears?
>> Then the second pass goes and emits the CUs, consulting this list of type homes to determine whether this type should be emitted in the CU, or reference a copy emitted elsewhere.
>
> My understanding is that this way assumes all CUs should be loaded into the memory and it does extra pass. i.e.

Sounds like your proposal would require that too - or require reloading the CUs twice? I think either strategy (keeping them loaded, or reloading them) could be used for either of our proposed directions (creating a separate unit, or using the existing units)?

> 1. the first pass enumerates in a multithreaded manner all object files, all compile units, and creates an indexed map(the list of type homes). (In the result all CUs from all object files are loaded into the memory at the same time. The indexed map is also in the memory).
>
> 2. the second pass enumerates in a multithreaded manner all object files, all compile units, and emits bodies(consulting the list of type homes to determine whether this type should be emitted in the CU, or reference a copy emitted elsewhere).
>
> 3. Patch sizes/offsets/references after individual CU bodies are glued into the resulting file.
>
> The scheme implemented in this patch and which might be done with additional compile unit keeping types - visit CU only once and then it might be unloaded from the memory:
>
> 1. the first pass enumerates in a multithreaded manner all object files, all compile units. Each CU is loaded, analyzed for types(there would be created a list of attribute referencing types), emitted. types moved into the artificial CU. unloaded.

"types moved into the artificial CU" means what exactly? Copying them type DIE tree into some intermediate, in-memory representation?

Ah, yes, though keeping them in memory may be expensive - it might be cheaper to rewalk the units/know the offsets to the DIEs and reload them in some way.

> 2. Emit artificial CU(After the first pass is finished all CU are unloaded from the memory, except the artificial one).
>
> 3. Patch sizes/offsets/references after individual CU bodies are glued into the resulting file. (At this stage type's references from already emitted CU bodies would be patched to proper offsets inside artificial CU).
>
> this scheme does not need to have two passes and it does not need to load all CUs into the memory at the same time.
>
>> (as for type merging - that might also be possible with the scheme I'm proposing - if we're rewriting DIEs anyway, seems plausible we could add new child DIEs to canonical type type DIEs)
>
> That would be one more reason to keep CUs in memory. When we rewrite a canonical DIE in some CU we would need to gather all children from all other CUs.
>
> The artificial CU(from the scheme with separate CU keeping types) would have merged type so it does not require to keep source CU.

Yes, if we're merging types it becomes somewhat more complicated - again, could do a two pass (without necessarily keeping all CUs in memory) - go quickly skim the CUs, for each type, record the type name and member list - then merge those lists, keeping the type record for the lowest indexed CU, and a merged member list that mentions which offset/CU each member comes from. This is essentially the same data you'd have to keep in memory (either fully in memory, or some kind of skeleton data that refers to the CUs/input file itself, so it can be reparsed as needed) for the "emit a separate unit full of types at the end" approach.

And in any case, you couldn't emit the units in parallel in the first pass, because you wouldn't know which offsets to write them to, right? (because the size of units will be changing during this process)

So I'm not sure where the parallelism comes into your scheme (& even in my scheme it'd be a bit non-trivial - I guess you'd have to record all the DIE references in each unit that might become cross-unit references (so you know you'll have to change their encoded size) and all the types (so you know whether that'll get bigger (if members are merged in) or smaller (if the type is removed in favor of referencing a type in another unit) - not sure there's a huge difference in performance/complexity between the two, perhaps.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96035/new/

https://reviews.llvm.org/D96035