[llvm] r255909 - [ThinLTO] Metadata linking for imported functions

Mon Jan 11 06:34:14 PST 2016

On Sun, Jan 10, 2016 at 1:23 PM, Duncan P. N. Exon Smith
<dexonsmith at apple.com> wrote:
>
>> On 2016-Jan-08, at 11:21, Teresa Johnson <tejohnson at google.com> wrote:
>>
>> On Tue, Jan 5, 2016 at 1:49 PM, Teresa Johnson <tejohnson at google.com> wrote:
>>> On Tue, Jan 5, 2016 at 12:13 PM, Duncan P. N. Exon Smith
>>> <dexonsmith at apple.com> wrote:
>>>>
>>>>> On 2016-Jan-05, at 11:38, Teresa Johnson <tejohnson at google.com> wrote:
>>>>>
>>>>> On Tue, Jan 5, 2016 at 9:07 AM, Teresa Johnson <tejohnson at google.com> wrote:
>>>>>> On Mon, Jan 4, 2016 at 1:19 PM, Duncan P. N. Exon Smith
>>>>>> <dexonsmith at apple.com> wrote:
>>>>>>>
>>>>>>>> On 2015-Dec-29, at 18:00, Teresa Johnson <tejohnson at google.com> wrote:
>>>>>>>>
>>>>>>>> On Wed, Dec 23, 2015 at 8:34 AM, Duncan P. N. Exon Smith
>>>>>>>> <dexonsmith at apple.com> wrote:
>>>>>>>>>
>>>>>>>>>> On 2015-Dec-17, at 12:14, Teresa Johnson via llvm-commits <llvm-commits at lists.llvm.org> wrote:
>>>>>>>>>>
>>>>>>>>>> Author: tejohnson
>>>>>>>>>> Date: Thu Dec 17 11:14:09 2015
>>>>>>>>>> New Revision: 255909
>>>>>>>>>>
>>>>>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=255909&view=rev
>>>>>>>>>> Log:
>>>>>>>>>> [ThinLTO] Metadata linking for imported functions
>>>>>>>>>>
>>>>>>>>>> Summary:
>>>>>>>>>> Second patch split out from http://reviews.llvm.org/D14752.
>>>>>>>>>>
>>>>>>>>>> Maps metadata as a post-pass from each module when importing complete,
>>>>>>>>>> suturing up final metadata to the temporary metadata left on the
>>>>>>>>>> imported instructions.
>>>>>>>>>>
>>>>>>>>>> This entails saving the mapping from bitcode value id to temporary
>>>>>>>>>> metadata in the importing pass, and from bitcode value id to final
>>>>>>>>>> metadata during the metadata linking postpass.
>>>>>>>>>>
>>>>>>>>>> Depends on D14825.
>>>>>>>>>>
>>>>>>>>>> Reviewers: dexonsmith, joker.eph
>>>>>>>>>>
>>>>>>>>>> Subscribers: davidxl, llvm-commits, joker.eph
>>>>>>>>>>
>>>>>>>>>> Differential Revision: http://reviews.llvm.org/D14838
>>>>>>>>>>
>>>>>>>>>> Added:
>>>>>>>>>> llvm/trunk/test/Linker/Inputs/thinlto_funcimport_debug.ll
>>>>>>>>>> llvm/trunk/test/Linker/thinlto_funcimport_debug.ll
>>>>>>>>>> llvm/trunk/test/Transforms/FunctionImport/Inputs/funcimport_debug.ll
>>>>>>>>>> llvm/trunk/test/Transforms/FunctionImport/funcimport_debug.ll
>>>>>>>>>> Modified:
>>>>>>>>>> llvm/trunk/include/llvm/IR/GVMaterializer.h
>>>>>>>>>> llvm/trunk/include/llvm/IR/Metadata.h
>>>>>>>>>> llvm/trunk/include/llvm/IRReader/IRReader.h
>>>>>>>>>> llvm/trunk/include/llvm/Linker/IRMover.h
>>>>>>>>>> llvm/trunk/include/llvm/Linker/Linker.h
>>>>>>>>>> llvm/trunk/include/llvm/Transforms/Utils/ValueMapper.h
>>>>>>>>>> llvm/trunk/lib/Bitcode/Reader/BitcodeReader.cpp
>>>>>>>>>> llvm/trunk/lib/IR/Metadata.cpp
>>>>>>>>>> llvm/trunk/lib/IRReader/IRReader.cpp
>>>>>>>>>> llvm/trunk/lib/Linker/IRMover.cpp
>>>>>>>>>> llvm/trunk/lib/Linker/LinkModules.cpp
>>>>>>>>>> llvm/trunk/lib/Transforms/IPO/FunctionImport.cpp
>>>>>>>>>> llvm/trunk/lib/Transforms/Utils/ValueMapper.cpp
>>>>>>>>>> llvm/trunk/tools/llvm-link/llvm-link.cpp
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Modified: llvm/trunk/include/llvm/IR/Metadata.h
>>>>>>>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/IR/Metadata.h?rev=255909&r1=255908&r2=255909&view=diff
>>>>>>>>>> ==============================================================================
>>>>>>>>>> --- llvm/trunk/include/llvm/IR/Metadata.h (original)
>>>>>>>>>> +++ llvm/trunk/include/llvm/IR/Metadata.h Thu Dec 17 11:14:09 2015
>>>>>>>>>> @@ -832,10 +832,11 @@ public:
>>>>>>>>>> /// \brief Resolve cycles.
>>>>>>>>>> ///
>>>>>>>>>> /// Once all forward declarations have been resolved, force cycles to be
>>>>>>>>>> -  /// resolved.
>>>>>>>>>> +  /// resolved. If \p MDMaterialized is true, then any temporary metadata
>>>>>>>>>> +  /// is ignored, otherwise it asserts when encountering temporary metadata.
>>>>>>>>>
>>>>>>>>> Please expand a little on the semantic difference between `true` and
>>>>>>>>> `false` here.  In the `false` case, you're blocking identical `MDNode`s
>>>>>>>>> from getting uniqued together.
>>>>>>>>
>>>>>>>> Looks like I got the comment backwards. We ignore temp metadata when
>>>>>>>> MDMaterialized==false, otherwise we assert when seeing temp metadata
>>>>>>>> here (the default behavior). I don't think there is any other semantic
>>>>>>>> difference - the flag only affects temporary MD, which we shouldn't
>>>>>>>> normally see (which is why the default is the old behavior of
>>>>>>>> asserting on temp metadata).
>>>>>>>
>>>>>>> It's subtle, but I think it's important.  Consider the following.
>>>>>>
>>>>>> Thanks for walking me through this, I see the issue. Comments below.
>>>>>>
>>>>>>> --
>>>>>>> The LLVMContext has:
>>>>>>>   !1 = !{!2}
>>>>>>>   !2 = !{}
>>>>>>>
>>>>>>> a.bc contains:
>>>>>>>   !1 = !{!2}
>>>>>>>   !2 = !{}
>>>>>>> --
>>>>>>> The full-loader would read this in, and merge !1 and !2 together.
>>>>>>> --
>>>>>>> Let's say the lazy-loader chooses *only* to read !1 from a.bc,
>>>>>>> leaving !2 as a temporary.  Now LLVMContext has:
>>>>>>>   !1 = !{!2}
>>>>>>>   !2 = !{}
>>>>>>>   !3 = unresolved !{!4} // !1 from a.bc (with RAUW support)
>>>>>>>   !4 = temporary !{}    // !2 from a.bc
>>>>>>>
>>>>>>> Lazy-loader calls !3->resolveCycles(), giving:
>>>>>>>   !1 = !{!2}
>>>>>>>   !2 = !{}
>>>>>>>   !3 = !{!4}         // !1 from a.bc (no RAUW support)
>>>>>>>   !4 = temporary !{} // !2 from a.bc
>>>>>>
>>>>>> Note that the lazy metadata reading support delays reading of all
>>>>>> module-level metadata, which it later reads in a single pass. So the
>>>>>> above situation would only happen if !1 from a.bc is in a
>>>>>> function-level metadata block, and !2 is in its module-level metadata.
>>>>
>>>> Interesting.  I hadn't thought through this specific use case.  Can you
>>>> give a reduced example?
>>>
>>> I looked at my test case (test/Linker/thinlto_funcimport_debug.ll) and
>>> I see a couple of different cases where we end up with temp metadata
>>> after the function importing. One case involves llvm.dbg.value
>>> parameters (the DILocalVariable and a DIExpression), which are defined
>>> in module-level metadata. So in this case it is the top-level metadata
>>> node that is a temporary. The second case are the scope value on
>>> DILocation metadata attachments. The scope is defined in module-level
>>> metadata, but the DILocation itself is apparently a function-level
>>> metadata attachment. Here is an example that shows both cases:
>>>
>>> After importing a function, before metadata linking:
>>>
>>>         tail call void @llvm.dbg.value(metadata i32 %n, i64 0,
>>> metadata !15, metadata !16), !dbg !17
>>> ...
>>> !15 = <temporary!> !{}
>>> !16 = <temporary!> !{}
>>> !17 = !DILocation(line: 1, column: 15, scope: !14)
>>>
>>> After (module-level) metadata linking:
>>>
>>>  tail call void @llvm.dbg.value(metadata i32 %n, i64 0, metadata !15,
>>> metadata !23), !dbg !24
>>> ...
>>> !11 = distinct !DISubprogram(name: "func1", scope: !9, file: !9, line:
>>> 1, type: !12, isLocal: false, isDefinition: true, scopeLine: 1, flags:
>>> DIFlagPrototyped, isOptimized: true, variables: !14)
>>> ...
>>> !15 = !DILocalVariable(name: "n", arg: 1, scope: !11, file: !9, line:
>>> 1, type: !7)
>>> ...
>>> !23 = !DIExpression()
>>> !24 = !DILocation(line: 1, column: 15, scope: !11)
>>>
>>>>
>>>> Could/should the function-level metadata be delayed too?
>>>
>>> I don't think so, unless there is a big issue with mapping in that
>>> part when the function is imported. It is more efficient to do along
>>> with the function. And unlike module-level where we don't know what
>>> all is needed until all functions are imported, we know the function's
>>> metadata will be needed.
>>>
>>>>
>>>> Could/should we move more function-level metadata to the module-level?
>>>> (Probably not...)
>>>
>>> Not sure I know the answer to this one.
>>>
>>>>
>>>>>>> Eventually, we read in !4.  !3 hits a uniquing collision, but
>>>>>>> doesn't have RAUW support (since `resolveCycles()` dropped it).
>>>>>>> End result:
>>>>>>>   !1 = !{!2}
>>>>>>>   !2 = !{}
>>>>>>>   !3 = distinct !{!2} // !1 from a.bc
>>>>>>> --
>>>>>>> If `resolveCycles()` hadn't been called, we'd end up with:
>>>>>>>   !1 = !{!2}
>>>>>>>   !2 = !{}
>>>>>>> since !3 could resolve the uniquing collision using RAUW.
>>>>>>
>>>>>> I wonder if it would be possible to delay calling resolveCycles until
>>>>>> the module-level metadata is loaded? That might result in other
>>>>>> issues/differences though?
>>>>>
>>>>> I tried doing this - in MDNode::resolveCycles I invoke a new helper
>>>>> that checks if the MDNode or any operand recursively nested under it
>>>>> contains a temporary, and if so, resolveCycles returns without doing
>>>>> anything:
>>>>>
>>>>> -------------------
>>>>> static bool containsTemporaryImpl(MDNode *N,
>>>>>                                 SmallPtrSetImpl<MDNode *> &Visited) {
>>>>> if (!Visited.insert(N).second)
>>>>>   return false;
>>>>> if (N->isTemporary())
>>>>>   return true;
>>>>> for (auto &Op : N->operands()) {
>>>>>   auto *OpN = dyn_cast_or_null<MDNode>(Op);
>>>>>   if (OpN && containsTemporaryImpl(OpN, Visited))
>>>>
>>>> I think you wouldn't want to descend if this node `.isResolved()`.
>>>
>>> Ok
>>>
>>>>
>>>>>     return true;
>>>>> }
>>>>> return false;
>>>>> }
>>>>>
>>>>> static bool containsTemporary(MDNode *N) {
>>>>> SmallPtrSet<MDNode *, 8> Visited;
>>>>> return containsTemporaryImpl(N, Visited);
>>>>> }
>>>>>
>>>>> void MDNode::resolveCycles() {
>>>>> if (isResolved())
>>>>>   return;
>>>>>
>>>>> if (containsTemporary(this))
>>>>>   return;
>>>>> ...
>>>>> ----------------------
>>>>>
>>>>> So the resolveCycles would later be done when the module-level
>>>>> metadata is read and mapped.
>>>>
>>>> I'd prefer to keep the assertion in `resolveCycles()` to catch problems
>>>> where the caller *thinks* there are no temporaries left.  However, for
>>>> this use case you could (in theory) add helper function that returns
>>>> `bool` and guard the call to `resolveCycles()` with it.
>>>
>>> That makes sense. Do in the caller to guard the call or do it in a
>>> separate interface like resolveNonTemporaries.
>>>
>>>>
>>>>> This works in my test case (which does invoke resolveCycles with a
>>>>> temp MD node), but admittedly I don't have a lot of ThinLTO debug test
>>>>> cases. Does this seem like a reasonable approach to test further?
>>>>> Otherwise, I can just change to having the two interfaces as you
>>>>> suggested.
>>>>
>>>> I think the two interfaces might be cleaner, since in other use cases
>>>> the assertion is important.
>>>
>>> Ok, agreed.
>>>
>>>> Also, even in this use case, it's important
>>>> to drop the RAUW support to keep memory usage in check.
>>>
>>> I was hoping with the above helper we could avoid dropping the RAUW
>>> support, so that in cases like your example we would end up with less
>>> metadata at the end. If we still want to drop RAUW here, then no point
>>> in skipping the resolving (just do your original suggestion of using a
>>> separate well-documented interface). Do you think it will be important
>>> to keep the memory down this way between function importing and the
>>> metadata linking (which immediately follows after function importing
>>> is complete) though?
>>
>> Ping on this question.
>
> Best way to evaluate would be to run an -flto -g build of something
> somewhat large (clang (or even llc) should be big enough) and look at
> the memory graph.

This doesn't apply to a regular -flto build, only ThinLTO, although I
could evaluate with -flto=thin -g on something large. Can you clarify
what you mean by the memory graph? Are you referring to the
-track-memory llvm output?

>
>> I tried comparing the current implementation to the above change to
>> skip resolveCycles on anything that reaches a temporary metadata
>> (which as noted above affects the metadata mapping during function
>> importing but is ultimately resolved during the bulk metadata linking
>> that occurs after all functions are imported when the temp metadata is
>> replaced).
>>
>> I dumped out and compared the dest module just before and after the
>> metadata postpass linking. I then compared that and the compile
>> times/memory usage between the current implementation and the above
>> change for a few cpu2006 benchmarks (mcf, bzip2, perlbench and astar).
>> They all built fine with both versions of the code, and there was no
>> noticeable consistent change in compilation time or memory. There was
>> also no difference in the metadata between the two versions. I had
>> expected that delaying the resolving of nodes might have some effects
>> on the metadata (including avoiding the situation you described
>> earlier that we can end up in with the current approach, by delaying
>> dropping RAUW support).
>>
>> So I am not sure what to conclude. Does the current code's semantic
>> difference not really matter at all in practice? I.e. the potential
>> situation you described with preventing RAUW of some metadata is
>> really rare (although possible so agree that we want to change the
>> interface to document the possibility). In that case delaying the
>> resolving of cycles doesn't really matter/help? Theoretically it seems
>> like it would help in that situation, but I haven't (yet) been able to
>> provoke it.
>>
>> Perhaps I should just go with your original suggestion of adding the
>> new interface for the current behavior and documenting it, and not
>> worrying about it unless we find issues. WDYT?
>
> This seems fine to me.

Ok, will do this for now.

Thanks,
Teresa

>
>>
>> Thanks,
>> Teresa
>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Tereas
>>>>>
>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> The new operation completely violates the spirit of
>>>>>>> `resolveCycles()`, which (before your patch) was saying:
>>>>>>>   "I promise there are no more temporaries, so remaining
>>>>>>>   unresolved nodes must be part of a cycle."
>>>>>>> The `assert()` confirms that everything is in a cycle.
>>>>>>>
>>>>>>> With your new flag set, this has changed to:
>>>>>>>   "I don't care about uniquing the remaining nodes, so drop
>>>>>>>    RAUW support for them."
>>>>>>>
>>>>>>> Maybe something like this would make sense:
>>>>>>> --
>>>>>>> void resolveCycles()
>>>>>>> { resolveRecursivelyImpl(/* AllowTemps */ false); }
>>>>>>> void resolveNonTemporaries()
>>>>>>> { resolveRecursivelyImpl(/* AllowTemps */ true); }
>>>>>>>
>>>>>>> private:
>>>>>>> // Old resolveCycles() implementation.
>>>>>>> void resolveRecursivelyImpl(bool AllowTemps);
>>>>>>> --
>>>>>>> (Do you have a better name for `resolveNonTemporaries()`?)
>>>>>>
>>>>>> This sounds like a good idea. Don't have a better name suggestion,
>>>>>> will use that.
>>>>>>
>>>>>>>
>>>>>>> With the API split it might be easier to document the semantic
>>>>>>> differences (just by describing the use case for each).
>>>>>>>
>>>>>>> For `resolveCycles()`, please make clear (if it isn't already?)
>>>>>>> that we're promising there aren't any temporaries, and thus
>>>>>>> unresolved nodes are part of cycles and no longer need RAUW
>>>>>>> support.
>>>>>>>
>>>>>>> For `resolveNonTemporaries()`, please make clear that this is a
>>>>>>> destructive operation that blocks uniquing of metadata nodes.
>>>>>>>
>>>>>>
>>>>>> Will do.
>>>>>>
>>>>>> Thanks,
>>>>>> Teresa
>>>>>>
>>>>>> --
>>>>>> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
>>>>
>>>
>>>
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
>

-- 
Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413