[PATCH] D105251: [IR] Allow multiple instruction metadata attachments with same type

Thu Jul 1 14:07:49 PDT 2021

dexonsmith added a subscriber: aprantl.
dexonsmith added a comment.

In D105251#2853122 <https://reviews.llvm.org/D105251#2853122>, @tejohnson wrote:

>> I disagree with the assertion in the commit message:
>>
>>> This will be necessary to allow the global merge pass to attach
>>> multiple debug info metadata nodes to global variables once we reverse
>>> the edge from DIGlobalVariable to GlobalVariable.
>>
>> (since it wasn't necessary... multiple attachments could have been handled by adding indirection, as is typical for instruction metadata)
>> but the context/impact was different since the commit landed around the same time that global metadata attachments were introduced (https://reviews.llvm.org/D20074). This wasn't a subtle behaviour change that could affect existing code handling metadata attachments for global variables, since there was no such code (yet).
>
> I'm not sure why this is more of a subtle behavior change for instructions than it was for global objects. The main difference for instructions would be the DbgLoc attachments, since those are handled differently, but I think that could be addressed through some assertions as you note in your comments below, and possibly improved over time if needed.

Here's why I think it's a subtle change for instructions. If `getMetadata()` previously couldn't fail (returned `nullptr` or value), but now can assert or return a non-deterministic value, I fear it would make existing working code (say in optimizations that look at loop metadata) in subtle ways... but the bug would be latent, waiting for other code to start adding multiple attachments of the same kind.

For global objects, it wasn't a behaviour change at all. Global objects have used a multimap all along (give or take a single day). There's no existing code that suddenly became buggy. There was no change in the API, so it cannot have been subtle.

> As noted above, this actually fixes a discrepancy between GlobalObject and Instruction handling. I am not sure why that discrepancy is desirable, and the LangRef doesn't mention either way and doesn't seem to distinguish between attachments to global objects vs instructions.

I remain convinced that this is a change to LLVM IR semantics. Because of that, I assert that it should be reviewed by a wider audience.

(IMO, unifying behaviour here is a good idea, but I strongly disagree with framing that suggests it's fixing an accidental quirk, and I think it needs to be done carefully. It seems to be a changing long-standing (and intended) behaviour.)

>> A few design points that I'm not sure about:
>>
>> - What should `getMetadata` do if there are multiple attachments for the requested kind? (assert? return nullptr? return the first, last, or arbitrary?)
>
> The getMetadata here relies on the underlying Value::getMetadata support.

The existing support is for global objects. They started with an assertion that was then relaxed to something else.

The header docs currently says "fails", which sounds to me like a crash, which sounds like a scary and subtle behaviour change.

IMO, the best choices would be "first" or "last"; I think either one would avoid creating the sorts of subtle behaviour changes in existing code that I'm concerned about. Between them:

- "first" seems more natural / intuitive to me, not sure why.
- "last" would mean `getMetadata()` gives the same answer for textual IR that already listed two attachments of the same kind (previously, the last one would overwrite the first). Maybe it'd be nice to keep textual IR slightly more stable like this, not sure.

WDYT? (Seems like something that could be discussed on LLVM dev though...)

>> - Should there be a way for metadata kinds to explicitly opt-in (or out) of allowing multiple attachments? (Maybe could be stored on the LLVMContext somehow?)
>
> Not sure - should that be done for GlobalObjects as well? It isn't the case right now.

I think `!dbg` has a verifier check that guarantees only one attachment per global object. Probably makes sense to add verifier checks for most existing kinds for instructions (at least `!dbg`, maybe also tbaa, loop, maybe most others?). One option is to add the verifier check for all kinds, and document in LangRef that instructions by default only support one attachment per kind. Then when `!prof` code is updated to correctly handle multiple kinds, it can opt out of the verifier check (and update LangRef to say it supports multiple). (@aprantl, any thoughts?)

>> - If not, what do we do about the inconsistency with `!dbg`? Support multiple attachments somehow?
>
> That is a good question. I am not sure of the history of why !dbg on instructions is handled differently. Should that be changed to just use the same underlying Value metadata support?

Yeah, maybe it'd be better to unify `!dbg` as a prep patch, before landing this. Note that it's an important compile-time optimization, whose history I've explained inline below (I also have a concrete idea for how to unify it without losing the optimization).

>> - (And: why is this better than using a level of indirection when multiple attachments are needed?)
>
> I described some rationale above as to why I think it is cleaner, especially for something generic like !prof that specifies the profile data in the MDString tag first operand. And also for consistency with global objects, where as noted the support for multiple attachments of the same type has now been used for additional metadata attachment kinds.

SGTM, I suggest including that in the post proposing the change to LLVM IR.

================
Comment at: llvm/lib/IR/Metadata.cpp:1358-1361
+  if (KindID == LLVMContext::MD_dbg) {
+    DbgLoc = DebugLoc(&MD);
+    return;
+  }
----------------
tejohnson wrote:
> dexonsmith wrote:
> > Looks like `!dbg` doesn't support multiple instances. Should that be documented? Should this assert instead of silently overriding the existing one? Should the LLParser catch this as an error, or potentially use `setMetadata()` still for `!dbg`?
> Yeah I responded to this above too - do you know the history or rationale for having separate support for !dbg attachments on instructions? It should be documented somewhere, or fixed. The assert is a good idea, will add. And yes a hard error in LLParser is a good idea as well.
I know a bit about the history.

When debug info is on, it puts metadata on every instruction. The booking for metadata makes this very expensive, creating an attachment map for each instruction. Storing it directly on the instruction is cheaper.

This design dates from when Metadata inherited from Value and needed a `TrackingVH<MDNode>` here (which is bigger than a pointer I think?). `DebugLoc` stored 64-bits of data that could be used to look up the actual MDNode in the LLVMContext.

I bet there are other good options now for getting this storage optimization.

For example, something like this might be simpler and a bit more general:
```
lang=c++

class Instruction  /* or whatever Value wants this optimization */ {
  // ...

  /// Metadata attachments for `Instruction`.
  ///
  /// If the instruction has a single attachment and its MDKind<4 (fits
  /// in 2 bits), this points directly at the attachment and the int
  /// stores the kind. Otherwise (more than 1 attachment, or MDKind>=4),
  /// this points at the metadata table. Similar to a SmallPtrSet.
  PointerIntPair<PointerUnion<MDNode *, MDAttachments *>, 2> Attachments;
};
```
(And the MDAttachments could be skipped from the LLVMContextImpl, where it currently is.)

The same optimization probably makes sense for GlobalObjects - when debug info is on, almost all of them have a `!dbg` attachment.

Anyway, the reason I suggest this alternative storage scheme is that it gives us the `!dbg` optimization without having to treat `!dbg` specially. As a bonus, `!prof` would benefit from the same optimization.

(One subtle complication is that during IR construction, metadata might be "unresolved". When a single MDNode assigned to `Attachments` is "unresolved", we'd need to somehow hook this into the TrackingMDRef side-table for Metadata's usually-off-but-sometimes-on RAUW support.)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105251/new/

https://reviews.llvm.org/D105251