[PATCH] D96035: [WIP][dsymutil][DWARFlinker] implement separate multi-thread processing for compile units.

Tue Feb 23 16:00:36 PST 2021

dblaikie added a comment.

In D96035#2583116 <https://reviews.llvm.org/D96035#2583116>, @avl wrote:

>> I'm not sure why that would necessarily be better/faster - it'd still require two passes, right? One to collect the types, then another to emit the unit with the types and the units referencing that?
>
>
>
>> If it requires two passes, what about one pass that decides which of the type definitions to keep in the unit that defined them, and which to remove/make a reference to the kept ones? That could then potentially produce the same kind of (& possibly exactly the same) output as is the case today, rather than introducing a new CU?
>
> I think I did not understand the idea. Current processing looks similar to the above description. 
> For the single compile unit, we do the declaration context analysis step that decides which of the type definitions to keep in the unit that defined them, and which to remove/make a reference to the kept ones. Later we emit the body of the compile unit based on the results of the declaration context analysis step.
>
> When we decide which of the type definitions to keep in the unit we use ODR uniquing algorithm which sets the first met type definition as canonical and uses it later for type references in other compile units.
>
> But we do not have a fixed order of compile units, they are processed in parallel. If both CU1 and CU2 have the same type definition, depending on the real order, canonical type definition might be set for CU1 or CU2. How could we avoid that non-determinism using additional pass?

Comparing your proposal - avoiding nondeterminism by sorting the types by name in the new type-holding CU, we could do something similar, but instead sorting the CUs a type appears in to pick which of those existing locations to be the canonical home. (or, indeed, could sort by the original linear visitation order)

eg: multithreaded walk over each CU and find each type (this applies to both your proposal and what I'm suggesting here, I think) - then, rather than sorting by name and putting them in a new CU, sort by the "index" of the CU the type appeared in (where the index is the order the CUs would've been visited in in the current linear algorithm) then place/preserve the canonical type in the CU with the lowest index where the type appears?

Then the second pass goes and emits the CUs, consulting this list of type homes to determine whether this type should be emitted in the CU, or reference a copy emitted elsewhere.

I think there might be merit to the approach you're suggesting too - but I'm less sure it's necessary to significantly alter the output scheme to achieve the benefits of parallelism.

(as for type merging - that might also be possible with the scheme I'm proposing - if we're rewriting DIEs anyway, seems plausible we could add new child DIEs to canonical type type DIEs)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96035/new/

https://reviews.llvm.org/D96035