[PATCH] ValueMapper: Rewrite Mapper::mapMetadata without recursion

Tue Apr 5 11:39:24 PDT 2016

> On Apr 3, 2016, at 10:51 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
> 
> This commit completely rewrites Mapper::mapMetadata (the implementation
> of llvm::MapMetadata) using an iterative algorithm.  The guts of the new
> algorithm are in MDNodeMapper::map, the entry function in a new class.
> 
> Previously, Mapper::mapMetadata performed a recursive exploration of the
> graph with eager "just in case there's a reason" malloc traffic.
> 
> The new algorithm has these benefits:
> 
>  - New nodes and temporaries are not created eagerly.
>  - Uniquing cycles are not duplicated (see new unit test).
>  - No recursion.
> 
> Given a node to map, it does this:
> 
> 1. Use a worklist to perform a post-order traversal of the transitively
>    referenced unmapped nodes.
> 
> 2. Track which nodes will change operands, and which will have new
>    addresses in the mapped scheme.  Propagate the changes through the
>    POT until fixed point, to pick up uniquing cycles that need to
>    change.
> 
> 3. Map all the distinct nodes without touching their operands.  If
>    RF_MoveDistinctMetadata, they get mapped to themselves; otherwise,
>    they get mapped to clones.
> 
> 4. Map the uniqued nodes (bottom-up), lazily creating temporaries for
>    forward references as needed.
> 
> 5. Remap the operands of the distinct nodes.
> 
> MapMetadata barely shows up in the profiles of llvm-link I've looked at
> with the new algorithm.
> 
> It would be nice to break the strange remaining recursion on the Value
> side: MapValue => materializeInitFor => RemapInstruction => MapValue.  I
> think we could do this by somehow leaking (at least an opaque version
> of) the "Mapper" context into the API, perhaps by sending a callback
> function to materializeInitFor.
> 
> 
> <0001-ValueMapper-Rewrite-Mapper-mapMetadata-without-re-v1.patch>

The CPU profiling does not show a great improvement (~10-15%), because to many map query I think.
The memory profile is better with close to 50% less persistent memory, 33% less transient memory, but also a lot less transient memory allocations. 
On my example there were 2.5M allocations before and now there are only 266k allocations! A lot let stressful for the memory subsystem :)

On the code itself now, MDNodeMapper::map() is fairly clear but for the interaction between createPOT() and propagateChangedOperands(). 
The former says it is doing "a traversal of the unmapped subgraph, checking for changed operands”, which make it surprising that you need propagateChangedOperands() afterward. 

Below just two nits:

+  assert(!(M.Flags & RF_NoModuleLevelChanges) &&
+         "MDNodeMapper::map assumes module-level changes");

I think the string is wrong, it should be: "MDNodeMapper::map assumes NO module-level changes”.

+   /// \post Data::HasChangedAddress == RF_MoveDistinctMDs if N.isDistinct().

But the code is:

+   D.HasChangedAddress = !(M.Flags & RF_MoveDistinctMDs) && N.isDistinct();

Which does not seem to match the comment, I think the comment is wrong and should be:

+   /// \post Data::HasChangedAddress == !RF_MoveDistinctMDs if N.isDistinct().

— 
Mehdi