[llvm] r255909 - [ThinLTO] Metadata linking for imported functions

Teresa Johnson via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 5 13:49:48 PST 2016


On Tue, Jan 5, 2016 at 12:13 PM, Duncan P. N. Exon Smith
<dexonsmith at apple.com> wrote:
>
>> On 2016-Jan-05, at 11:38, Teresa Johnson <tejohnson at google.com> wrote:
>>
>> On Tue, Jan 5, 2016 at 9:07 AM, Teresa Johnson <tejohnson at google.com> wrote:
>>> On Mon, Jan 4, 2016 at 1:19 PM, Duncan P. N. Exon Smith
>>> <dexonsmith at apple.com> wrote:
>>>>
>>>>> On 2015-Dec-29, at 18:00, Teresa Johnson <tejohnson at google.com> wrote:
>>>>>
>>>>> On Wed, Dec 23, 2015 at 8:34 AM, Duncan P. N. Exon Smith
>>>>> <dexonsmith at apple.com> wrote:
>>>>>>
>>>>>>> On 2015-Dec-17, at 12:14, Teresa Johnson via llvm-commits <llvm-commits at lists.llvm.org> wrote:
>>>>>>>
>>>>>>> Author: tejohnson
>>>>>>> Date: Thu Dec 17 11:14:09 2015
>>>>>>> New Revision: 255909
>>>>>>>
>>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=255909&view=rev
>>>>>>> Log:
>>>>>>> [ThinLTO] Metadata linking for imported functions
>>>>>>>
>>>>>>> Summary:
>>>>>>> Second patch split out from http://reviews.llvm.org/D14752.
>>>>>>>
>>>>>>> Maps metadata as a post-pass from each module when importing complete,
>>>>>>> suturing up final metadata to the temporary metadata left on the
>>>>>>> imported instructions.
>>>>>>>
>>>>>>> This entails saving the mapping from bitcode value id to temporary
>>>>>>> metadata in the importing pass, and from bitcode value id to final
>>>>>>> metadata during the metadata linking postpass.
>>>>>>>
>>>>>>> Depends on D14825.
>>>>>>>
>>>>>>> Reviewers: dexonsmith, joker.eph
>>>>>>>
>>>>>>> Subscribers: davidxl, llvm-commits, joker.eph
>>>>>>>
>>>>>>> Differential Revision: http://reviews.llvm.org/D14838
>>>>>>>
>>>>>>> Added:
>>>>>>>  llvm/trunk/test/Linker/Inputs/thinlto_funcimport_debug.ll
>>>>>>>  llvm/trunk/test/Linker/thinlto_funcimport_debug.ll
>>>>>>>  llvm/trunk/test/Transforms/FunctionImport/Inputs/funcimport_debug.ll
>>>>>>>  llvm/trunk/test/Transforms/FunctionImport/funcimport_debug.ll
>>>>>>> Modified:
>>>>>>>  llvm/trunk/include/llvm/IR/GVMaterializer.h
>>>>>>>  llvm/trunk/include/llvm/IR/Metadata.h
>>>>>>>  llvm/trunk/include/llvm/IRReader/IRReader.h
>>>>>>>  llvm/trunk/include/llvm/Linker/IRMover.h
>>>>>>>  llvm/trunk/include/llvm/Linker/Linker.h
>>>>>>>  llvm/trunk/include/llvm/Transforms/Utils/ValueMapper.h
>>>>>>>  llvm/trunk/lib/Bitcode/Reader/BitcodeReader.cpp
>>>>>>>  llvm/trunk/lib/IR/Metadata.cpp
>>>>>>>  llvm/trunk/lib/IRReader/IRReader.cpp
>>>>>>>  llvm/trunk/lib/Linker/IRMover.cpp
>>>>>>>  llvm/trunk/lib/Linker/LinkModules.cpp
>>>>>>>  llvm/trunk/lib/Transforms/IPO/FunctionImport.cpp
>>>>>>>  llvm/trunk/lib/Transforms/Utils/ValueMapper.cpp
>>>>>>>  llvm/trunk/tools/llvm-link/llvm-link.cpp
>>>>>>>
>>>>>>>
>>>>>>> Modified: llvm/trunk/include/llvm/IR/Metadata.h
>>>>>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/IR/Metadata.h?rev=255909&r1=255908&r2=255909&view=diff
>>>>>>> ==============================================================================
>>>>>>> --- llvm/trunk/include/llvm/IR/Metadata.h (original)
>>>>>>> +++ llvm/trunk/include/llvm/IR/Metadata.h Thu Dec 17 11:14:09 2015
>>>>>>> @@ -832,10 +832,11 @@ public:
>>>>>>> /// \brief Resolve cycles.
>>>>>>> ///
>>>>>>> /// Once all forward declarations have been resolved, force cycles to be
>>>>>>> -  /// resolved.
>>>>>>> +  /// resolved. If \p MDMaterialized is true, then any temporary metadata
>>>>>>> +  /// is ignored, otherwise it asserts when encountering temporary metadata.
>>>>>>
>>>>>> Please expand a little on the semantic difference between `true` and
>>>>>> `false` here.  In the `false` case, you're blocking identical `MDNode`s
>>>>>> from getting uniqued together.
>>>>>
>>>>> Looks like I got the comment backwards. We ignore temp metadata when
>>>>> MDMaterialized==false, otherwise we assert when seeing temp metadata
>>>>> here (the default behavior). I don't think there is any other semantic
>>>>> difference - the flag only affects temporary MD, which we shouldn't
>>>>> normally see (which is why the default is the old behavior of
>>>>> asserting on temp metadata).
>>>>
>>>> It's subtle, but I think it's important.  Consider the following.
>>>
>>> Thanks for walking me through this, I see the issue. Comments below.
>>>
>>>> --
>>>> The LLVMContext has:
>>>>    !1 = !{!2}
>>>>    !2 = !{}
>>>>
>>>> a.bc contains:
>>>>    !1 = !{!2}
>>>>    !2 = !{}
>>>> --
>>>> The full-loader would read this in, and merge !1 and !2 together.
>>>> --
>>>> Let's say the lazy-loader chooses *only* to read !1 from a.bc,
>>>> leaving !2 as a temporary.  Now LLVMContext has:
>>>>    !1 = !{!2}
>>>>    !2 = !{}
>>>>    !3 = unresolved !{!4} // !1 from a.bc (with RAUW support)
>>>>    !4 = temporary !{}    // !2 from a.bc
>>>>
>>>> Lazy-loader calls !3->resolveCycles(), giving:
>>>>    !1 = !{!2}
>>>>    !2 = !{}
>>>>    !3 = !{!4}         // !1 from a.bc (no RAUW support)
>>>>    !4 = temporary !{} // !2 from a.bc
>>>
>>> Note that the lazy metadata reading support delays reading of all
>>> module-level metadata, which it later reads in a single pass. So the
>>> above situation would only happen if !1 from a.bc is in a
>>> function-level metadata block, and !2 is in its module-level metadata.
>
> Interesting.  I hadn't thought through this specific use case.  Can you
> give a reduced example?

I looked at my test case (test/Linker/thinlto_funcimport_debug.ll) and
I see a couple of different cases where we end up with temp metadata
after the function importing. One case involves llvm.dbg.value
parameters (the DILocalVariable and a DIExpression), which are defined
in module-level metadata. So in this case it is the top-level metadata
node that is a temporary. The second case are the scope value on
DILocation metadata attachments. The scope is defined in module-level
metadata, but the DILocation itself is apparently a function-level
metadata attachment. Here is an example that shows both cases:

After importing a function, before metadata linking:

         tail call void @llvm.dbg.value(metadata i32 %n, i64 0,
metadata !15, metadata !16), !dbg !17
...
!15 = <temporary!> !{}
!16 = <temporary!> !{}
!17 = !DILocation(line: 1, column: 15, scope: !14)

After (module-level) metadata linking:

  tail call void @llvm.dbg.value(metadata i32 %n, i64 0, metadata !15,
metadata !23), !dbg !24
...
!11 = distinct !DISubprogram(name: "func1", scope: !9, file: !9, line:
1, type: !12, isLocal: false, isDefinition: true, scopeLine: 1, flags:
DIFlagPrototyped, isOptimized: true, variables: !14)
...
!15 = !DILocalVariable(name: "n", arg: 1, scope: !11, file: !9, line:
1, type: !7)
...
!23 = !DIExpression()
!24 = !DILocation(line: 1, column: 15, scope: !11)

>
> Could/should the function-level metadata be delayed too?

I don't think so, unless there is a big issue with mapping in that
part when the function is imported. It is more efficient to do along
with the function. And unlike module-level where we don't know what
all is needed until all functions are imported, we know the function's
metadata will be needed.

>
> Could/should we move more function-level metadata to the module-level?
> (Probably not...)

Not sure I know the answer to this one.

>
>>>> Eventually, we read in !4.  !3 hits a uniquing collision, but
>>>> doesn't have RAUW support (since `resolveCycles()` dropped it).
>>>> End result:
>>>>    !1 = !{!2}
>>>>    !2 = !{}
>>>>    !3 = distinct !{!2} // !1 from a.bc
>>>> --
>>>> If `resolveCycles()` hadn't been called, we'd end up with:
>>>>    !1 = !{!2}
>>>>    !2 = !{}
>>>> since !3 could resolve the uniquing collision using RAUW.
>>>
>>> I wonder if it would be possible to delay calling resolveCycles until
>>> the module-level metadata is loaded? That might result in other
>>> issues/differences though?
>>
>> I tried doing this - in MDNode::resolveCycles I invoke a new helper
>> that checks if the MDNode or any operand recursively nested under it
>> contains a temporary, and if so, resolveCycles returns without doing
>> anything:
>>
>> -------------------
>> static bool containsTemporaryImpl(MDNode *N,
>>                                  SmallPtrSetImpl<MDNode *> &Visited) {
>>  if (!Visited.insert(N).second)
>>    return false;
>>  if (N->isTemporary())
>>    return true;
>>  for (auto &Op : N->operands()) {
>>    auto *OpN = dyn_cast_or_null<MDNode>(Op);
>>    if (OpN && containsTemporaryImpl(OpN, Visited))
>
> I think you wouldn't want to descend if this node `.isResolved()`.

Ok

>
>>      return true;
>>  }
>>  return false;
>> }
>>
>> static bool containsTemporary(MDNode *N) {
>>  SmallPtrSet<MDNode *, 8> Visited;
>>  return containsTemporaryImpl(N, Visited);
>> }
>>
>> void MDNode::resolveCycles() {
>>  if (isResolved())
>>    return;
>>
>>  if (containsTemporary(this))
>>    return;
>> ...
>> ----------------------
>>
>> So the resolveCycles would later be done when the module-level
>> metadata is read and mapped.
>
> I'd prefer to keep the assertion in `resolveCycles()` to catch problems
> where the caller *thinks* there are no temporaries left.  However, for
> this use case you could (in theory) add helper function that returns
> `bool` and guard the call to `resolveCycles()` with it.

That makes sense. Do in the caller to guard the call or do it in a
separate interface like resolveNonTemporaries.

>
>> This works in my test case (which does invoke resolveCycles with a
>> temp MD node), but admittedly I don't have a lot of ThinLTO debug test
>> cases. Does this seem like a reasonable approach to test further?
>> Otherwise, I can just change to having the two interfaces as you
>> suggested.
>
> I think the two interfaces might be cleaner, since in other use cases
> the assertion is important.

Ok, agreed.

> Also, even in this use case, it's important
> to drop the RAUW support to keep memory usage in check.

I was hoping with the above helper we could avoid dropping the RAUW
support, so that in cases like your example we would end up with less
metadata at the end. If we still want to drop RAUW here, then no point
in skipping the resolving (just do your original suggestion of using a
separate well-documented interface). Do you think it will be important
to keep the memory down this way between function importing and the
metadata linking (which immediately follows after function importing
is complete) though?

>
>>
>> Thanks,
>> Tereas
>>
>>>
>>>> --
>>>>
>>>> The new operation completely violates the spirit of
>>>> `resolveCycles()`, which (before your patch) was saying:
>>>>    "I promise there are no more temporaries, so remaining
>>>>    unresolved nodes must be part of a cycle."
>>>> The `assert()` confirms that everything is in a cycle.
>>>>
>>>> With your new flag set, this has changed to:
>>>>    "I don't care about uniquing the remaining nodes, so drop
>>>>     RAUW support for them."
>>>>
>>>> Maybe something like this would make sense:
>>>> --
>>>> void resolveCycles()
>>>> { resolveRecursivelyImpl(/* AllowTemps */ false); }
>>>> void resolveNonTemporaries()
>>>> { resolveRecursivelyImpl(/* AllowTemps */ true); }
>>>>
>>>> private:
>>>> // Old resolveCycles() implementation.
>>>> void resolveRecursivelyImpl(bool AllowTemps);
>>>> --
>>>> (Do you have a better name for `resolveNonTemporaries()`?)
>>>
>>> This sounds like a good idea. Don't have a better name suggestion,
>>> will use that.
>>>
>>>>
>>>> With the API split it might be easier to document the semantic
>>>> differences (just by describing the use case for each).
>>>>
>>>> For `resolveCycles()`, please make clear (if it isn't already?)
>>>> that we're promising there aren't any temporaries, and thus
>>>> unresolved nodes are part of cycles and no longer need RAUW
>>>> support.
>>>>
>>>> For `resolveNonTemporaries()`, please make clear that this is a
>>>> destructive operation that blocks uniquing of metadata nodes.
>>>>
>>>
>>> Will do.
>>>
>>> Thanks,
>>> Teresa
>>>
>>> --
>>> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
>



-- 
Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413


More information about the llvm-commits mailing list