[PATCH] D105251: [IR] Allow multiple instruction metadata attachments with same type

Fri Jul 2 12:36:32 PDT 2021

tejohnson added a comment.

In D105251#2855813 <https://reviews.llvm.org/D105251#2855813>, @dexonsmith wrote:

> In D105251#2854679 <https://reviews.llvm.org/D105251#2854679>, @tejohnson wrote:
>
>> In D105251#2854351 <https://reviews.llvm.org/D105251#2854351>, @dexonsmith wrote:
>>
>>> - We probably need an API for "setting" the `!"branch_weights"` entry (since blindly calling `addMetadata()` would make the above verifier fail)
>>
>> Not sure I follow - right now there shouldn't be multiple branch weights prof metadata and I don't see a need for that going forward (the heap profiler data would use a different/new tag for its !prof metadata).
>
> Agreed.
>
> Were you intending to substitute all existing calls to `setMetadata()` with calls to `addMetadata()`? I wrote the above bullet after noticing that doing so would be incorrect when updating metadata (e.g., fixing up branch weights after a transformation). Instead of replacing the tag you'll end up with two of them (rendering the IR invalid). My suggestion is to add a helper for the logic needed to update metadata safely.

No I initially didn't intend to update these to addMetadata, but that is a hole that will be exposed once we use heap profile in combination with sample PGO where we may end up with both the new heap prof metadata and a branch_weights metadata on a call, so that is a good point that we need to think through that now if we go with the approach in this patch.

> Walking through my logic in more detail:
>
> This patch (the one we're looking at) will give `Instruction` the capability of having multiple attachments of the same kind. It'll have a verifier check that disallows it in practice (since it'll initially be invalid IR for all metadata kinds), but then specific metadata kinds can opt-in as needed (by opting out of the verifier check, and adding new more specific checks as is useful).
>
> A future patch will leverage this feature for `!prof` attachments. The suggested new rules for the `!prof` attachment treat it as a set, where nodes are uniqued by their first operand. But it's not a multiset; only one node is allowed for each identity tag. (Alternatively, you could see this as a map, where the data is the node and the key is its first operand.)

There are 2 motivations for this patch from my side:

1. My primary motivation: allow multiple new heap prof metadatas on a single allocation call without indirection, via multiple !prof attachments each with the same tag. E.g.

  call malloc !prof !0, !prof !1

  !0 = !{"heap", [stack trace 1 heap profile data]}
  !1 = !{"heap", [stack trace 2 heap profile data]} 

This could alternatively be addressed via indirection, possibly with an optimization to inline it if there is only one. I.e.

  call malloc !prof !0

  ; if a single stack trace
  !0 = !{"heap", [stack trace 1 heap profile data]} 

  ; if >1 stack traces
  !0 = !{"heap", !{1, !2}}
  !1 = !{[stack trace 1 heap profile data]}
  !2 = !{[stack trace 2 heap profile data]}

2. When we combine heap profile data with sample PGO we may end up with both heap prof and branch_weights prof metadata on a single call. This could alternatively be addressed by using a different md kind altogether, e.g. !heapprof.

What you describe below will not work if we want to attach multiple "heap" !prof attachments. But I'm leaning towards the alternate described above where we use indirection if there is >1 which would simplify things. Otherwise we need additional special casing which is not ideal.

> In some cases, code will have a priori knowledge that an instruction does not yet have a given tag attached. In those cases, it'll be safe to call `addMetadata()` since it won't invalidate the set.
>
> In other cases, code will need logic something like this for the replace operation:
>
>   auto replaceProfAttachment = [](const Instruction &I, MDNode &NewN) {
>     SmallVector<MDNode *> MDs;
>     I.getMetadata(MD_prof, MDs);
>     for (int N = 0, NE = MDs.size(); N != NE; ++N) {
>       if (MDs[N]->getOperand(0) != NewN.getOperand(0))
>         continue;
>   
>       // Found a node with the same tag to replace.
>       MDs[N] = &NewN;
>       I.setMetadata(MD_prof, MDs);
>       return;
>     }
>     // Tag not found. Safe to append.
>     I.addMetadata(MD_prof, &NewN);
>   };
>
> Maybe an API like this could be generalized?
>
>   I.addOrReplaceMetadataWithTaggedOperand(MD_prof, NewN, /*TaggedOperandNo=*/0);
>
> Scanning through the code, many existing calls to `setMetadata()` with `MD_prof` are on new instructions (the former case where addMetadata is safe). But there are also a few that look like updates to me (the latter case where it isn't).
>
> I also noticed calls to `setMetadata(MD_prof, nullptr)` -- not sure if you'd want those updated to delete just the branch weights (which would need a set-erase operation), or if it always would make sense to clear all `!prof` attachments together.

These are very good questions and thanks for the detailed suggestions above. I initially was ignoring these calls, since I was focused on motivation #1 and not thinking as much about the case where we would be combining this with sample PGO (since that is a longer term goal) and have !prof with different tags on the same instruction, but it is better to address these now if we go with allowing multiple !prof attachments with different tags to avoid surprises later. These existing calls to setMetadata to overwrite or clear the branch_weights md do need to be updated along the lines of what you suggest above (i.e. with a new interface to do it by tag) to address that future need.

>> Do you mean add something new like !heapprof? I thought about that, but it seemed cleaner to put all profile data under !prof. But it would solve part of the issue (needing multiple !prof due to other types of profile metadata on the inst), and avoid the need for this change if we also use a level of indirection when there is more than one heap profile per allocation.
>
> Yes, exactly.
>
> If you end up going that way (and abandoning this patch), probably one of us should update LangRef to better document the existing behaviour (maybe the commit message can reference this review for a few design points to consider if someone else needs the feature).

Agreed. Let me discuss this with my collaborators and think about it a little more. But yes, in either case let's plan to update LangRef. I was surprised that it didn't mention anything in this regard. Will circle back probably after the holiday.

> Also in that case, I'm curious if you think it'd make sense to split out `!valueprof` at some point to make it orthogonal from branch weights? (Or maybe it doesn't matter, since they'll never/rarely be on the same instructions?)

Currently they are on different instructions than branch_weight metadata (otherwise I guess we would have hit this issue previously). I'm not sure if that will ever change, but if it does then I guess we could migrate to the same solution we end up going to for heap profiling.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105251/new/

https://reviews.llvm.org/D105251