[PATCH] D96035: [WIP][dsymutil][DWARFlinker] implement separate multi-thread processing for compile units.

Tue Feb 23 14:00:44 PST 2021

avl added a comment.

> I'm not sure why that would necessarily be better/faster - it'd still require two passes, right? One to collect the types, then another to emit the unit with the types and the units referencing that?

> If it requires two passes, what about one pass that decides which of the type definitions to keep in the unit that defined them, and which to remove/make a reference to the kept ones? That could then potentially produce the same kind of (& possibly exactly the same) output as is the case today, rather than introducing a new CU?

I think I did not understand the idea. Current processing looks similar to the above description. 
For the single compile unit, we do the declaration context analysis step that decides which of the type definitions to keep in the unit that defined them, and which to remove/make a reference to the kept ones. Later we emit the body of the compile unit based on the results of the declaration context analysis step.

When we decide which of the type definitions to keep in the unit we use ODR uniquing algorithm which sets the first met type definition as canonical and uses it later for type references in other compile units.

But we do not have a fixed order of compile units, they are processed in parallel. If both CU1 and CU2 have the same type definition, depending on the real order, canonical type definition might be set for CU1 or CU2. How could we avoid that non-determinism using additional pass?

Speaking of the solution with artificial CU it has several advantages before the current ODR algorithm:

1. Since DWARFLinker generates artificial CU then it might generate it in a deterministic way. Despite the real order of processed types, we might always generate them in the same form(sorted by name or another).

2. Current ODR algorithm makes CU processing be depending on common resource - ODR Declaration context. That slowdowns parallel execution(threads wait for each other). If we would generate separate artificial CU for types then processing of current CU might be done without waiting for ODR Declaration context. it might increase effective CPU utilization ratio.

3. Since we generate types - it is possible to implement type merging, which might save a significant amount of space.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96035/new/

https://reviews.llvm.org/D96035