[llvm] r255909 - [ThinLTO] Metadata linking for imported functions

Mon Jan 11 09:14:53 PST 2016

> On 2016-Jan-11, at 06:34, Teresa Johnson <tejohnson at google.com> wrote:
> 
> On Sun, Jan 10, 2016 at 1:23 PM, Duncan P. N. Exon Smith
> <dexonsmith at apple.com> wrote:
>> 
>>> On 2016-Jan-08, at 11:21, Teresa Johnson <tejohnson at google.com> wrote:
>>> 
>>> On Tue, Jan 5, 2016 at 1:49 PM, Teresa Johnson <tejohnson at google.com> wrote:
>>>> On Tue, Jan 5, 2016 at 12:13 PM, Duncan P. N. Exon Smith
>>>> <dexonsmith at apple.com> wrote:
>>>>> 
>>>>>> On 2016-Jan-05, at 11:38, Teresa Johnson <tejohnson at google.com> wrote:
>>>>>> 
>>>>>> On Tue, Jan 5, 2016 at 9:07 AM, Teresa Johnson <tejohnson at google.com> wrote:
>>>>>>> On Mon, Jan 4, 2016 at 1:19 PM, Duncan P. N. Exon Smith
>>>>>>> <dexonsmith at apple.com> wrote:
>>>>>>>> 
>>>>>>>>> On 2015-Dec-29, at 18:00, Teresa Johnson <tejohnson at google.com> wrote:
>>>>>>>>> 
>>>>>>>>> On Wed, Dec 23, 2015 at 8:34 AM, Duncan P. N. Exon Smith
>>>>>>>>> <dexonsmith at apple.com> wrote:
>>>>>>>>>> 
>>>>>>>>>>> On 2015-Dec-17, at 12:14, Teresa Johnson via llvm-commits <llvm-commits at lists.llvm.org> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Author: tejohnson
>>>>>>>>>>> Date: Thu Dec 17 11:14:09 2015
>>>>>>>>>>> New Revision: 255909
>>>>>>>>>>> 
>>>>>>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=255909&view=rev
>>>>>>>>>>> Log:
>>>>>>>>>>> [ThinLTO] Metadata linking for imported functions
>>>>>>>>>>> 
>>>>>>>>>>> Summary:
>>>>>>>>>>> Second patch split out from http://reviews.llvm.org/D14752.
>>>>>>>>>>> 
>>>>>>>>>>> Maps metadata as a post-pass from each module when importing complete,
>>>>>>>>>>> suturing up final metadata to the temporary metadata left on the
>>>>>>>>>>> imported instructions.
>>>>>>>>>>> 
>>>>>>>>>>> This entails saving the mapping from bitcode value id to temporary
>>>>>>>>>>> metadata in the importing pass, and from bitcode value id to final
>>>>>>>>>>> metadata during the metadata linking postpass.
>>>>>>>>>>> 
>>>>>>>>>>> Depends on D14825.
>>>>>>>>>>> 
>>>>>>>>>>> Reviewers: dexonsmith, joker.eph
>>>>>>>>>>> 
>>>>>>>>>>> Subscribers: davidxl, llvm-commits, joker.eph
>>>>>>>>>>> 
>>>>>>>>>>> Differential Revision: http://reviews.llvm.org/D14838
>>>>>>>>>>> 
>>>>>>>>>>> Added:
>>>>>>>>>>> llvm/trunk/test/Linker/Inputs/thinlto_funcimport_debug.ll
>>>>>>>>>>> llvm/trunk/test/Linker/thinlto_funcimport_debug.ll
>>>>>>>>>>> llvm/trunk/test/Transforms/FunctionImport/Inputs/funcimport_debug.ll
>>>>>>>>>>> llvm/trunk/test/Transforms/FunctionImport/funcimport_debug.ll
>>>>>>>>>>> Modified:
>>>>>>>>>>> llvm/trunk/include/llvm/IR/GVMaterializer.h
>>>>>>>>>>> llvm/trunk/include/llvm/IR/Metadata.h
>>>>>>>>>>> llvm/trunk/include/llvm/IRReader/IRReader.h
>>>>>>>>>>> llvm/trunk/include/llvm/Linker/IRMover.h
>>>>>>>>>>> llvm/trunk/include/llvm/Linker/Linker.h
>>>>>>>>>>> llvm/trunk/include/llvm/Transforms/Utils/ValueMapper.h
>>>>>>>>>>> llvm/trunk/lib/Bitcode/Reader/BitcodeReader.cpp
>>>>>>>>>>> llvm/trunk/lib/IR/Metadata.cpp
>>>>>>>>>>> llvm/trunk/lib/IRReader/IRReader.cpp
>>>>>>>>>>> llvm/trunk/lib/Linker/IRMover.cpp
>>>>>>>>>>> llvm/trunk/lib/Linker/LinkModules.cpp
>>>>>>>>>>> llvm/trunk/lib/Transforms/IPO/FunctionImport.cpp
>>>>>>>>>>> llvm/trunk/lib/Transforms/Utils/ValueMapper.cpp
>>>>>>>>>>> llvm/trunk/tools/llvm-link/llvm-link.cpp
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Modified: llvm/trunk/include/llvm/IR/Metadata.h
>>>>>>>>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/IR/Metadata.h?rev=255909&r1=255908&r2=255909&view=diff
>>>>>>>>>>> ==============================================================================
>>>>>>>>>>> --- llvm/trunk/include/llvm/IR/Metadata.h (original)
>>>>>>>>>>> +++ llvm/trunk/include/llvm/IR/Metadata.h Thu Dec 17 11:14:09 2015
>>>>>>>>>>> @@ -832,10 +832,11 @@ public:
>>>>>>>>>>> /// \brief Resolve cycles.
>>>>>>>>>>> ///
>>>>>>>>>>> /// Once all forward declarations have been resolved, force cycles to be
>>>>>>>>>>> -  /// resolved.
>>>>>>>>>>> +  /// resolved. If \p MDMaterialized is true, then any temporary metadata
>>>>>>>>>>> +  /// is ignored, otherwise it asserts when encountering temporary metadata.
>>>>>>>>>> 
>>>>>>>>>> Please expand a little on the semantic difference between `true` and
>>>>>>>>>> `false` here.  In the `false` case, you're blocking identical `MDNode`s
>>>>>>>>>> from getting uniqued together.
>>>>>>>>> 
>>>>>>>>> Looks like I got the comment backwards. We ignore temp metadata when
>>>>>>>>> MDMaterialized==false, otherwise we assert when seeing temp metadata
>>>>>>>>> here (the default behavior). I don't think there is any other semantic
>>>>>>>>> difference - the flag only affects temporary MD, which we shouldn't
>>>>>>>>> normally see (which is why the default is the old behavior of
>>>>>>>>> asserting on temp metadata).
>>>>>>>> 
>>>>>>>> It's subtle, but I think it's important.  Consider the following.
>>>>>>> 
>>>>>>> Thanks for walking me through this, I see the issue. Comments below.
>>>>>>> 
>>>>>>>> --
>>>>>>>> The LLVMContext has:
>>>>>>>>  !1 = !{!2}
>>>>>>>>  !2 = !{}
>>>>>>>> 
>>>>>>>> a.bc contains:
>>>>>>>>  !1 = !{!2}
>>>>>>>>  !2 = !{}
>>>>>>>> --
>>>>>>>> The full-loader would read this in, and merge !1 and !2 together.
>>>>>>>> --
>>>>>>>> Let's say the lazy-loader chooses *only* to read !1 from a.bc,
>>>>>>>> leaving !2 as a temporary.  Now LLVMContext has:
>>>>>>>>  !1 = !{!2}
>>>>>>>>  !2 = !{}
>>>>>>>>  !3 = unresolved !{!4} // !1 from a.bc (with RAUW support)
>>>>>>>>  !4 = temporary !{}    // !2 from a.bc
>>>>>>>> 
>>>>>>>> Lazy-loader calls !3->resolveCycles(), giving:
>>>>>>>>  !1 = !{!2}
>>>>>>>>  !2 = !{}
>>>>>>>>  !3 = !{!4}         // !1 from a.bc (no RAUW support)
>>>>>>>>  !4 = temporary !{} // !2 from a.bc
>>>>>>> 
>>>>>>> Note that the lazy metadata reading support delays reading of all
>>>>>>> module-level metadata, which it later reads in a single pass. So the
>>>>>>> above situation would only happen if !1 from a.bc is in a
>>>>>>> function-level metadata block, and !2 is in its module-level metadata.
>>>>> 
>>>>> Interesting.  I hadn't thought through this specific use case.  Can you
>>>>> give a reduced example?
>>>> 
>>>> I looked at my test case (test/Linker/thinlto_funcimport_debug.ll) and
>>>> I see a couple of different cases where we end up with temp metadata
>>>> after the function importing. One case involves llvm.dbg.value
>>>> parameters (the DILocalVariable and a DIExpression), which are defined
>>>> in module-level metadata. So in this case it is the top-level metadata
>>>> node that is a temporary. The second case are the scope value on
>>>> DILocation metadata attachments. The scope is defined in module-level
>>>> metadata, but the DILocation itself is apparently a function-level
>>>> metadata attachment. Here is an example that shows both cases:
>>>> 
>>>> After importing a function, before metadata linking:
>>>> 
>>>>        tail call void @llvm.dbg.value(metadata i32 %n, i64 0,
>>>> metadata !15, metadata !16), !dbg !17
>>>> ...
>>>> !15 = <temporary!> !{}
>>>> !16 = <temporary!> !{}
>>>> !17 = !DILocation(line: 1, column: 15, scope: !14)
>>>> 
>>>> After (module-level) metadata linking:
>>>> 
>>>> tail call void @llvm.dbg.value(metadata i32 %n, i64 0, metadata !15,
>>>> metadata !23), !dbg !24
>>>> ...
>>>> !11 = distinct !DISubprogram(name: "func1", scope: !9, file: !9, line:
>>>> 1, type: !12, isLocal: false, isDefinition: true, scopeLine: 1, flags:
>>>> DIFlagPrototyped, isOptimized: true, variables: !14)
>>>> ...
>>>> !15 = !DILocalVariable(name: "n", arg: 1, scope: !11, file: !9, line:
>>>> 1, type: !7)
>>>> ...
>>>> !23 = !DIExpression()
>>>> !24 = !DILocation(line: 1, column: 15, scope: !11)
>>>> 
>>>>> 
>>>>> Could/should the function-level metadata be delayed too?
>>>> 
>>>> I don't think so, unless there is a big issue with mapping in that
>>>> part when the function is imported. It is more efficient to do along
>>>> with the function. And unlike module-level where we don't know what
>>>> all is needed until all functions are imported, we know the function's
>>>> metadata will be needed.
>>>> 
>>>>> 
>>>>> Could/should we move more function-level metadata to the module-level?
>>>>> (Probably not...)
>>>> 
>>>> Not sure I know the answer to this one.
>>>> 
>>>>> 
>>>>>>>> Eventually, we read in !4.  !3 hits a uniquing collision, but
>>>>>>>> doesn't have RAUW support (since `resolveCycles()` dropped it).
>>>>>>>> End result:
>>>>>>>>  !1 = !{!2}
>>>>>>>>  !2 = !{}
>>>>>>>>  !3 = distinct !{!2} // !1 from a.bc
>>>>>>>> --
>>>>>>>> If `resolveCycles()` hadn't been called, we'd end up with:
>>>>>>>>  !1 = !{!2}
>>>>>>>>  !2 = !{}
>>>>>>>> since !3 could resolve the uniquing collision using RAUW.
>>>>>>> 
>>>>>>> I wonder if it would be possible to delay calling resolveCycles until
>>>>>>> the module-level metadata is loaded? That might result in other
>>>>>>> issues/differences though?
>>>>>> 
>>>>>> I tried doing this - in MDNode::resolveCycles I invoke a new helper
>>>>>> that checks if the MDNode or any operand recursively nested under it
>>>>>> contains a temporary, and if so, resolveCycles returns without doing
>>>>>> anything:
>>>>>> 
>>>>>> -------------------
>>>>>> static bool containsTemporaryImpl(MDNode *N,
>>>>>>                                SmallPtrSetImpl<MDNode *> &Visited) {
>>>>>> if (!Visited.insert(N).second)
>>>>>>  return false;
>>>>>> if (N->isTemporary())
>>>>>>  return true;
>>>>>> for (auto &Op : N->operands()) {
>>>>>>  auto *OpN = dyn_cast_or_null<MDNode>(Op);
>>>>>>  if (OpN && containsTemporaryImpl(OpN, Visited))
>>>>> 
>>>>> I think you wouldn't want to descend if this node `.isResolved()`.
>>>> 
>>>> Ok
>>>> 
>>>>> 
>>>>>>    return true;
>>>>>> }
>>>>>> return false;
>>>>>> }
>>>>>> 
>>>>>> static bool containsTemporary(MDNode *N) {
>>>>>> SmallPtrSet<MDNode *, 8> Visited;
>>>>>> return containsTemporaryImpl(N, Visited);
>>>>>> }
>>>>>> 
>>>>>> void MDNode::resolveCycles() {
>>>>>> if (isResolved())
>>>>>>  return;
>>>>>> 
>>>>>> if (containsTemporary(this))
>>>>>>  return;
>>>>>> ...
>>>>>> ----------------------
>>>>>> 
>>>>>> So the resolveCycles would later be done when the module-level
>>>>>> metadata is read and mapped.
>>>>> 
>>>>> I'd prefer to keep the assertion in `resolveCycles()` to catch problems
>>>>> where the caller *thinks* there are no temporaries left.  However, for
>>>>> this use case you could (in theory) add helper function that returns
>>>>> `bool` and guard the call to `resolveCycles()` with it.
>>>> 
>>>> That makes sense. Do in the caller to guard the call or do it in a
>>>> separate interface like resolveNonTemporaries.
>>>> 
>>>>> 
>>>>>> This works in my test case (which does invoke resolveCycles with a
>>>>>> temp MD node), but admittedly I don't have a lot of ThinLTO debug test
>>>>>> cases. Does this seem like a reasonable approach to test further?
>>>>>> Otherwise, I can just change to having the two interfaces as you
>>>>>> suggested.
>>>>> 
>>>>> I think the two interfaces might be cleaner, since in other use cases
>>>>> the assertion is important.
>>>> 
>>>> Ok, agreed.
>>>> 
>>>>> Also, even in this use case, it's important
>>>>> to drop the RAUW support to keep memory usage in check.
>>>> 
>>>> I was hoping with the above helper we could avoid dropping the RAUW
>>>> support, so that in cases like your example we would end up with less
>>>> metadata at the end. If we still want to drop RAUW here, then no point
>>>> in skipping the resolving (just do your original suggestion of using a
>>>> separate well-documented interface). Do you think it will be important
>>>> to keep the memory down this way between function importing and the
>>>> metadata linking (which immediately follows after function importing
>>>> is complete) though?
>>> 
>>> Ping on this question.
>> 
>> Best way to evaluate would be to run an -flto -g build of something
>> somewhat large (clang (or even llc) should be big enough) and look at
>> the memory graph.
> 
> This doesn't apply to a regular -flto build, only ThinLTO, although I
> could evaluate with -flto=thin -g on something large. Can you clarify
> what you mean by the memory graph? Are you referring to the
> -track-memory llvm output?

No, I just meant looking at memory vs. time with your favourite memory
(allocations) profiling tool.

>> 
>>> I tried comparing the current implementation to the above change to
>>> skip resolveCycles on anything that reaches a temporary metadata
>>> (which as noted above affects the metadata mapping during function
>>> importing but is ultimately resolved during the bulk metadata linking
>>> that occurs after all functions are imported when the temp metadata is
>>> replaced).
>>> 
>>> I dumped out and compared the dest module just before and after the
>>> metadata postpass linking. I then compared that and the compile
>>> times/memory usage between the current implementation and the above
>>> change for a few cpu2006 benchmarks (mcf, bzip2, perlbench and astar).
>>> They all built fine with both versions of the code, and there was no
>>> noticeable consistent change in compilation time or memory. There was
>>> also no difference in the metadata between the two versions. I had
>>> expected that delaying the resolving of nodes might have some effects
>>> on the metadata (including avoiding the situation you described
>>> earlier that we can end up in with the current approach, by delaying
>>> dropping RAUW support).
>>> 
>>> So I am not sure what to conclude. Does the current code's semantic
>>> difference not really matter at all in practice? I.e. the potential
>>> situation you described with preventing RAUW of some metadata is
>>> really rare (although possible so agree that we want to change the
>>> interface to document the possibility). In that case delaying the
>>> resolving of cycles doesn't really matter/help? Theoretically it seems
>>> like it would help in that situation, but I haven't (yet) been able to
>>> provoke it.
>>> 
>>> Perhaps I should just go with your original suggestion of adding the
>>> new interface for the current behavior and documenting it, and not
>>> worrying about it unless we find issues. WDYT?
>> 
>> This seems fine to me.
> 
> Ok, will do this for now.
> 
> Thanks,
> Teresa
> 
>> 
>>> 
>>> Thanks,
>>> Teresa
>>>> 
>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> Tereas
>>>>>> 
>>>>>>> 
>>>>>>>> --
>>>>>>>> 
>>>>>>>> The new operation completely violates the spirit of
>>>>>>>> `resolveCycles()`, which (before your patch) was saying:
>>>>>>>>  "I promise there are no more temporaries, so remaining
>>>>>>>>  unresolved nodes must be part of a cycle."
>>>>>>>> The `assert()` confirms that everything is in a cycle.
>>>>>>>> 
>>>>>>>> With your new flag set, this has changed to:
>>>>>>>>  "I don't care about uniquing the remaining nodes, so drop
>>>>>>>>   RAUW support for them."
>>>>>>>> 
>>>>>>>> Maybe something like this would make sense:
>>>>>>>> --
>>>>>>>> void resolveCycles()
>>>>>>>> { resolveRecursivelyImpl(/* AllowTemps */ false); }
>>>>>>>> void resolveNonTemporaries()
>>>>>>>> { resolveRecursivelyImpl(/* AllowTemps */ true); }
>>>>>>>> 
>>>>>>>> private:
>>>>>>>> // Old resolveCycles() implementation.
>>>>>>>> void resolveRecursivelyImpl(bool AllowTemps);
>>>>>>>> --
>>>>>>>> (Do you have a better name for `resolveNonTemporaries()`?)
>>>>>>> 
>>>>>>> This sounds like a good idea. Don't have a better name suggestion,
>>>>>>> will use that.
>>>>>>> 
>>>>>>>> 
>>>>>>>> With the API split it might be easier to document the semantic
>>>>>>>> differences (just by describing the use case for each).
>>>>>>>> 
>>>>>>>> For `resolveCycles()`, please make clear (if it isn't already?)
>>>>>>>> that we're promising there aren't any temporaries, and thus
>>>>>>>> unresolved nodes are part of cycles and no longer need RAUW
>>>>>>>> support.
>>>>>>>> 
>>>>>>>> For `resolveNonTemporaries()`, please make clear that this is a
>>>>>>>> destructive operation that blocks uniquing of metadata nodes.
>>>>>>>> 
>>>>>>> 
>>>>>>> Will do.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Teresa
>>>>>>> 
>>>>>>> --
>>>>>>> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
>>> 
>>> 
>>> 
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
>> 
> 
> 
> 
> -- 
> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413