[PATCH] D105251: [IR] Allow multiple instruction metadata attachments with same type

Thu Jul 1 15:40:17 PDT 2021

tejohnson added a comment.

In D105251#2854141 <https://reviews.llvm.org/D105251#2854141>, @dexonsmith wrote:

> In D105251#2853122 <https://reviews.llvm.org/D105251#2853122>, @tejohnson wrote:
>
>>> I disagree with the assertion in the commit message:
>>>
>>>> This will be necessary to allow the global merge pass to attach
>>>> multiple debug info metadata nodes to global variables once we reverse
>>>> the edge from DIGlobalVariable to GlobalVariable.
>>>
>>> (since it wasn't necessary... multiple attachments could have been handled by adding indirection, as is typical for instruction metadata)
>>> but the context/impact was different since the commit landed around the same time that global metadata attachments were introduced (https://reviews.llvm.org/D20074). This wasn't a subtle behaviour change that could affect existing code handling metadata attachments for global variables, since there was no such code (yet).
>>
>> I'm not sure why this is more of a subtle behavior change for instructions than it was for global objects. The main difference for instructions would be the DbgLoc attachments, since those are handled differently, but I think that could be addressed through some assertions as you note in your comments below, and possibly improved over time if needed.
>
> Here's why I think it's a subtle change for instructions. If `getMetadata()` previously couldn't fail (returned `nullptr` or value), but now can assert or return a non-deterministic value, I fear it would make existing working code (say in optimizations that look at loop metadata) in subtle ways... but the bug would be latent, waiting for other code to start adding multiple attachments of the same kind.

Can you clarify how this change would affect currently working code, since we don't currently support >1 attachment? The only thing I can think of is llvm assembly code that incorrectly has >1 metadata attachment on an instruction. In that case the compiler currently silently will keep the last one. As you note further below we could simulate that behavior for getMetadata instead of asserting by returning the last one. We could also add in the verification check you suggested to add an allowlist of attachment types that can have >1 attachment and clean up any tests that fire, since they presumably are working accidentally.

> For global objects, it wasn't a behaviour change at all. Global objects have used a multimap all along (give or take a single day). There's no existing code that suddenly became buggy. There was no change in the API, so it cannot have been subtle.

While they may have supported >1 attachment internally, due to the lack of a getMetadata with multiple MDs returned or addMetadata before D20414 <https://reviews.llvm.org/D20414> I'm not sure how it would have in practice supported >1 attachment of the same type?

>> As noted above, this actually fixes a discrepancy between GlobalObject and Instruction handling. I am not sure why that discrepancy is desirable, and the LangRef doesn't mention either way and doesn't seem to distinguish between attachments to global objects vs instructions.
>
> I remain convinced that this is a change to LLVM IR semantics. Because of that, I assert that it should be reviewed by a wider audience.

I can send out an RFC then.

> (IMO, unifying behaviour here is a good idea, but I strongly disagree with framing that suggests it's fixing an accidental quirk, and I think it needs to be done carefully. It seems to be a changing long-standing (and intended) behaviour.)
>
>>> A few design points that I'm not sure about:
>>>
>>> - What should `getMetadata` do if there are multiple attachments for the requested kind? (assert? return nullptr? return the first, last, or arbitrary?)
>>
>> The getMetadata here relies on the underlying Value::getMetadata support.
>
> The existing support is for global objects. They started with an assertion that was then relaxed to something else.
>
> The header docs currently says "fails", which sounds to me like a crash, which sounds like a scary and subtle behaviour change.
>
> IMO, the best choices would be "first" or "last"; I think either one would avoid creating the sorts of subtle behaviour changes in existing code that I'm concerned about. Between them:
>
> - "first" seems more natural / intuitive to me, not sure why.
> - "last" would mean `getMetadata()` gives the same answer for textual IR that already listed two attachments of the same kind (previously, the last one would overwrite the first). Maybe it'd be nice to keep textual IR slightly more stable like this, not sure.
>
> WDYT? (Seems like something that could be discussed on LLVM dev though...)

Ok. Right, the LLParser will silently drop all but the last if you try to use multiple on an instruction with the same type. But the Value implementation returns the first afaict.

>>> - Should there be a way for metadata kinds to explicitly opt-in (or out) of allowing multiple attachments? (Maybe could be stored on the LLVMContext somehow?)
>>
>> Not sure - should that be done for GlobalObjects as well? It isn't the case right now.
>
> I think `!dbg` has a verifier check that guarantees only one attachment per global object. Probably makes sense to add verifier checks for most existing kinds for instructions (at least `!dbg`, maybe also tbaa, loop, maybe most others?). One option is to add the verifier check for all kinds, and document in LangRef that instructions by default only support one attachment per kind. Then when `!prof` code is updated to correctly handle multiple kinds, it can opt out of the verifier check (and update LangRef to say it supports multiple). (@aprantl, any thoughts?)
>
>>> - If not, what do we do about the inconsistency with `!dbg`? Support multiple attachments somehow?
>>
>> That is a good question. I am not sure of the history of why !dbg on instructions is handled differently. Should that be changed to just use the same underlying Value metadata support?
>
> Yeah, maybe it'd be better to unify `!dbg` as a prep patch, before landing this. Note that it's an important compile-time optimization, whose history I've explained inline below (I also have a concrete idea for how to unify it without losing the optimization).

Thanks for the background and the suggestions below. I think changing that should be orthogonal to adding this general support to instructions though, especially if we go with the allowlist so !dbg only allows one. Or is there another reason we would need to change that first?

>>> - (And: why is this better than using a level of indirection when multiple attachments are needed?)
>>
>> I described some rationale above as to why I think it is cleaner, especially for something generic like !prof that specifies the profile data in the MDString tag first operand. And also for consistency with global objects, where as noted the support for multiple attachments of the same type has now been used for additional metadata attachment kinds.
>
> SGTM, I suggest including that in the post proposing the change to LLVM IR.

Ack.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105251/new/

https://reviews.llvm.org/D105251