[PATCH] D105251: [IR] Allow multiple instruction metadata attachments with same type

Thu Jul 1 08:48:19 PDT 2021

tejohnson added a comment.

Hi @dexonsmith, thanks for your comments! Responses below.

In D105251#2851935 <https://reviews.llvm.org/D105251#2851935>, @dexonsmith wrote:

> In D105251#2851926 <https://reviews.llvm.org/D105251#2851926>, @davidxl wrote:
>
>> I think Teresa referred to the inconsistency between Global Object and Instruction. Is that intentional?
>
> Thanks for clarifying!
>
> I did some archival to confirm: yes, this was intentional. The change for globals landed in: https://reviews.llvm.org/D20414

Right, my patch description points to that patch. =)

>> IR: Allow multiple global metadata attachments with the same type.
>
> (although it missed documenting the difference in LangRef, certainly it was specifically changing globals without changing instructions)

It was intentional with that change, but AFAICT that was based on need at the time and at least in the patch there is no pointer to a broader discussion on llvm-dev about changing the langref or as to why this fundamentally should only be allowed for global objects and not instructions. I am not sure why there should be a difference in functionality?

Note also that while this was initially added for debug info messages on variables, the support is now used for other types of global variable metadata attachments. For example !type metadata - vtables have one per type.

> I disagree with the assertion in the commit message:
>
>> This will be necessary to allow the global merge pass to attach
>> multiple debug info metadata nodes to global variables once we reverse
>> the edge from DIGlobalVariable to GlobalVariable.
>
> (since it wasn't necessary... multiple attachments could have been handled by adding indirection, as is typical for instruction metadata)
> but the context/impact was different since the commit landed around the same time that global metadata attachments were introduced (https://reviews.llvm.org/D20074). This wasn't a subtle behaviour change that could affect existing code handling metadata attachments for global variables, since there was no such code (yet).

I'm not sure why this is more of a subtle behavior change for instructions than it was for global objects. The main difference for instructions would be the DbgLoc attachments, since those are handled differently, but I think that could be addressed through some assertions as you note in your comments below, and possibly improved over time if needed.

> (The inconsistency between instructions / globals is certainly unfortunate... just want to make sure that if we add this feature for instructions the impact is carefully considered...)

In D105251#2851896 <https://reviews.llvm.org/D105251#2851896>, @dexonsmith wrote:

> I'm a bit concerned about the subtle difference in behaviour... in the past, metadata schemes that needed multiple attachments added a level of indirection (point at a tuple of metadata). Will that work for the heap profiler?

As noted above, global variables use multiple attachments for !type metadata, in addition to the original need for !dbg attachments.

For profile data, since currently all profile data uses a !prof attachment, but different MDString "tags", it seems cleaner IMO to use a different !prof attachment for different types of profile data. For instrumentation based PGO and PGHO (profile guided heap optimization), for now I think we would have the PGO "branch_weights" metadata and the PGHO TBD metadata on different types of instructions (branches vs calls), but if we combine SamplePGO with PGHO we will have both types on the same type of instructions (SamplePGO puts "branch_weight" metadata on calls). While we technically could support by changing all !prof to have a level of indirection, it seems cleaner IMO to allow different !prof attachments for the different types of profile data.

>> because it is an inconsistency in IR support
>> (the LLVM langref does not specify anything with regards to this
>> behavior).
>
> While the current behaviour (one attachment per kind) is not documented in LangRef, it's documented in the header docs, and I do think it was intentional; and in any case, it has been around for a long time and there could be code patterns relying on it.
>
> Since this changes IR semantics, I suggest adding a change to LangRef and a discussion on llvm-dev (and/or the new proposals system).

As noted above, this actually fixes a discrepancy between GlobalObject and Instruction handling. I am not sure why that discrepancy is desirable, and the LangRef doesn't mention either way and doesn't seem to distinguish between attachments to global objects vs instructions. If it is documented in the header docs beyond the places I changed please let me know and I can update that in this patch too.

> A few design points that I'm not sure about:
>
> - What should `getMetadata` do if there are multiple attachments for the requested kind? (assert? return nullptr? return the first, last, or arbitrary?)

The getMetadata here relies on the underlying Value::getMetadata support. I copied the comments from there for the most part (at least for the header interfaces). Looking back, the original implementation in D20414 <https://reviews.llvm.org/D20414> added an assert if there was more than one attachment of the requested type (for the getMetadata returing a single attachment). Sometime between that patch and when this was refactored into Value::getMetadata it appears that support disappeared. It looks like now the first is blindly returned. This should probably be fixed in the underlying Value::getMetadata call. I'll need to go back and look at the changes in between to see where/why this changed.

> - Should there be a way for metadata kinds to explicitly opt-in (or out) of allowing multiple attachments? (Maybe could be stored on the LLVMContext somehow?)

Not sure - should that be done for GlobalObjects as well? It isn't the case right now.

> - If not, what do we do about the inconsistency with `!dbg`? Support multiple attachments somehow?

That is a good question. I am not sure of the history of why !dbg on instructions is handled differently. Should that be changed to just use the same underlying Value metadata support?

> - (And: why is this better than using a level of indirection when multiple attachments are needed?)

I described some rationale above as to why I think it is cleaner, especially for something generic like !prof that specifies the profile data in the MDString tag first operand. And also for consistency with global objects, where as noted the support for multiple attachments of the same type has now been used for additional metadata attachment kinds.

> I also have a bunch of comments inline assuming this goes forward.

================
Comment at: llvm/include/llvm/IR/Instruction.h:287-293
+  /// Get the metadata of given kind attached to this Instruction. Requires
+  /// that the instruction has at most a single attachment of the given kind.
   /// If the metadata is not found then return null.
   MDNode *getMetadata(StringRef Kind) const {
     if (!hasMetadata()) return nullptr;
     return getMetadataImpl(Kind);
   }
----------------
dexonsmith wrote:
> What happens if there are multiple? Assert, return first, return arbitrary, or return `nullptr`?
I addressed this in my comments above too, but to reiterate, this relies on the behavior of the underlying Value::getMetadata support, and that support just seems to return the first matching one, which is a regression in the behavior from the original support added in D20414, which asserted if there was more than one. I need to go back and see where/why that assert was removed before it was refactored into the Value class in D67626. Since Instruction will use the underlying Value support, it should be fixed there.

================
Comment at: llvm/include/llvm/IR/Instruction.h:329

+  /// Add a metadata attachment of the specified kind.
+  void addMetadata(unsigned KindID, MDNode &MD);
----------------
dexonsmith wrote:
> Might be good to mention that if `KindID` already exists, this breaks future calls to `getMetadata()`.
Will do.

================
Comment at: llvm/lib/IR/Metadata.cpp:1358-1361
+  if (KindID == LLVMContext::MD_dbg) {
+    DbgLoc = DebugLoc(&MD);
+    return;
+  }
----------------
dexonsmith wrote:
> Looks like `!dbg` doesn't support multiple instances. Should that be documented? Should this assert instead of silently overriding the existing one? Should the LLParser catch this as an error, or potentially use `setMetadata()` still for `!dbg`?
Yeah I responded to this above too - do you know the history or rationale for having separate support for !dbg attachments on instructions? It should be documented somewhere, or fixed. The assert is a good idea, will add. And yes a hard error in LLParser is a good idea as well.

================
Comment at: llvm/lib/Transforms/Utils/ValueMapper.cpp:924-926
+  I->clearMetadata();
+  for (const auto &MI : MDs)
+    I->addMetadata(MI.first, *cast<MDNode>(mapMetadata(MI.second)));
----------------
dexonsmith wrote:
> - I think this drops handling of `mapMetadata()` returning nullptr (which can happen if the ValueMapper client has mapped some metadata node explicitly). Is that intentional?
> 
> - I wonder if there should be a `setAllMetadata()` API matching `getAllMetadata()` that combines the clear and the add loop (but without all the separate densemap lookups). This callsite would still need a loop in the middle to call `mapMetadata()`.
Ah the change to cast from cast_or_null was unintentional. I copied this from the code elsewhere in this file that does the exact same thing for GlobalObject md attachments.

The setAllMetadata interface sounds good, I would need to add in Value as well and can use in this file for GlobalObjects.

================
Comment at: llvm/test/Bitcode/branch-weight.ll:19

 ; RUN: llvm-dis %S/Inputs/branch-weight.bc -o - | FileCheck %s
+; CHECK: !prof !0, !prof !1, !prof !2
----------------
dexonsmith wrote:
> Seems a shame to modify a binary file. Could you instead add a separate test that round-trips to bitcode?
Sure. Yeah, not sure why this was done as a binary file in the first place. I can llvm-as it fine with --disable-verify, which is how I generated the new binary file.

================
Comment at: llvm/test/Bitcode/branch-weight.ll:20-23
+; CHECK: !prof !0, !prof !1, !prof !2
+; We should no longer have the second branch_weights metadata
+; CHECK: !prof !3, !prof !4{{$}}
+; CHECK: !0 = !{!"foo1"}
----------------
dexonsmith wrote:
> I think it'd be good for this to use string substitution blocks rather than relying on stable numbering: https://www.llvm.org/docs/CommandGuide/FileCheck.html#filecheck-string-substitution-blocks
Will do.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105251/new/

https://reviews.llvm.org/D105251