[llvm] r255909 - [ThinLTO] Metadata linking for imported functions

Mon Jan 4 13:19:15 PST 2016

> On 2015-Dec-29, at 18:00, Teresa Johnson <tejohnson at google.com> wrote:
> 
> On Wed, Dec 23, 2015 at 8:34 AM, Duncan P. N. Exon Smith
> <dexonsmith at apple.com> wrote:
>> 
>>> On 2015-Dec-17, at 12:14, Teresa Johnson via llvm-commits <llvm-commits at lists.llvm.org> wrote:
>>> 
>>> Author: tejohnson
>>> Date: Thu Dec 17 11:14:09 2015
>>> New Revision: 255909
>>> 
>>> URL: http://llvm.org/viewvc/llvm-project?rev=255909&view=rev
>>> Log:
>>> [ThinLTO] Metadata linking for imported functions
>>> 
>>> Summary:
>>> Second patch split out from http://reviews.llvm.org/D14752.
>>> 
>>> Maps metadata as a post-pass from each module when importing complete,
>>> suturing up final metadata to the temporary metadata left on the
>>> imported instructions.
>>> 
>>> This entails saving the mapping from bitcode value id to temporary
>>> metadata in the importing pass, and from bitcode value id to final
>>> metadata during the metadata linking postpass.
>>> 
>>> Depends on D14825.
>>> 
>>> Reviewers: dexonsmith, joker.eph
>>> 
>>> Subscribers: davidxl, llvm-commits, joker.eph
>>> 
>>> Differential Revision: http://reviews.llvm.org/D14838
>>> 
>>> Added:
>>>   llvm/trunk/test/Linker/Inputs/thinlto_funcimport_debug.ll
>>>   llvm/trunk/test/Linker/thinlto_funcimport_debug.ll
>>>   llvm/trunk/test/Transforms/FunctionImport/Inputs/funcimport_debug.ll
>>>   llvm/trunk/test/Transforms/FunctionImport/funcimport_debug.ll
>>> Modified:
>>>   llvm/trunk/include/llvm/IR/GVMaterializer.h
>>>   llvm/trunk/include/llvm/IR/Metadata.h
>>>   llvm/trunk/include/llvm/IRReader/IRReader.h
>>>   llvm/trunk/include/llvm/Linker/IRMover.h
>>>   llvm/trunk/include/llvm/Linker/Linker.h
>>>   llvm/trunk/include/llvm/Transforms/Utils/ValueMapper.h
>>>   llvm/trunk/lib/Bitcode/Reader/BitcodeReader.cpp
>>>   llvm/trunk/lib/IR/Metadata.cpp
>>>   llvm/trunk/lib/IRReader/IRReader.cpp
>>>   llvm/trunk/lib/Linker/IRMover.cpp
>>>   llvm/trunk/lib/Linker/LinkModules.cpp
>>>   llvm/trunk/lib/Transforms/IPO/FunctionImport.cpp
>>>   llvm/trunk/lib/Transforms/Utils/ValueMapper.cpp
>>>   llvm/trunk/tools/llvm-link/llvm-link.cpp
>>> 
>>> 
>>> Modified: llvm/trunk/include/llvm/IR/Metadata.h
>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/IR/Metadata.h?rev=255909&r1=255908&r2=255909&view=diff
>>> ==============================================================================
>>> --- llvm/trunk/include/llvm/IR/Metadata.h (original)
>>> +++ llvm/trunk/include/llvm/IR/Metadata.h Thu Dec 17 11:14:09 2015
>>> @@ -832,10 +832,11 @@ public:
>>>  /// \brief Resolve cycles.
>>>  ///
>>>  /// Once all forward declarations have been resolved, force cycles to be
>>> -  /// resolved.
>>> +  /// resolved. If \p MDMaterialized is true, then any temporary metadata
>>> +  /// is ignored, otherwise it asserts when encountering temporary metadata.
>> 
>> Please expand a little on the semantic difference between `true` and
>> `false` here.  In the `false` case, you're blocking identical `MDNode`s
>> from getting uniqued together.
> 
> Looks like I got the comment backwards. We ignore temp metadata when
> MDMaterialized==false, otherwise we assert when seeing temp metadata
> here (the default behavior). I don't think there is any other semantic
> difference - the flag only affects temporary MD, which we shouldn't
> normally see (which is why the default is the old behavior of
> asserting on temp metadata).

It's subtle, but I think it's important.  Consider the following.
--
The LLVMContext has:
    !1 = !{!2}
    !2 = !{}

a.bc contains:
    !1 = !{!2}
    !2 = !{}
--
The full-loader would read this in, and merge !1 and !2 together.
--
Let's say the lazy-loader chooses *only* to read !1 from a.bc,
leaving !2 as a temporary.  Now LLVMContext has:
    !1 = !{!2}
    !2 = !{}
    !3 = unresolved !{!4} // !1 from a.bc (with RAUW support)
    !4 = temporary !{}    // !2 from a.bc

Lazy-loader calls !3->resolveCycles(), giving:
    !1 = !{!2}
    !2 = !{}
    !3 = !{!4}         // !1 from a.bc (no RAUW support)
    !4 = temporary !{} // !2 from a.bc

Eventually, we read in !4.  !3 hits a uniquing collision, but
doesn't have RAUW support (since `resolveCycles()` dropped it).
End result:
    !1 = !{!2}
    !2 = !{}
    !3 = distinct !{!2} // !1 from a.bc
--
If `resolveCycles()` hadn't been called, we'd end up with:
    !1 = !{!2}
    !2 = !{}
since !3 could resolve the uniquing collision using RAUW.
--

The new operation completely violates the spirit of
`resolveCycles()`, which (before your patch) was saying:
    "I promise there are no more temporaries, so remaining
    unresolved nodes must be part of a cycle."
The `assert()` confirms that everything is in a cycle.

With your new flag set, this has changed to:
    "I don't care about uniquing the remaining nodes, so drop
     RAUW support for them."

Maybe something like this would make sense:
--
void resolveCycles()
{ resolveRecursivelyImpl(/* AllowTemps */ false); }
void resolveNonTemporaries()
{ resolveRecursivelyImpl(/* AllowTemps */ true); }

private:
// Old resolveCycles() implementation.
void resolveRecursivelyImpl(bool AllowTemps);
--
(Do you have a better name for `resolveNonTemporaries()`?)

With the API split it might be easier to document the semantic
differences (just by describing the use case for each).

For `resolveCycles()`, please make clear (if it isn't already?)
that we're promising there aren't any temporaries, and thus
unresolved nodes are part of cycles and no longer need RAUW
support.

For `resolveNonTemporaries()`, please make clear that this is a
destructive operation that blocks uniquing of metadata nodes.