[PATCH] D104827: [DebugInfo] Enforce implicit constraints on `distinct` MDNodes

Scott Linder via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Dec 14 15:43:27 PST 2021


scott.linder added a comment.

In D104827#3140973 <https://reviews.llvm.org/D104827#3140973>, @scott.linder wrote:

> In D104827#3138627 <https://reviews.llvm.org/D104827#3138627>, @dexonsmith wrote:
>
>> In D104827#3137807 <https://reviews.llvm.org/D104827#3137807>, @scott.linder wrote:
>>
>>>> Is this moving in the right direction? Does the restriction on DIArgList seem
>>>> reasonable, and if so does it need more explicit documentation or more thorough
>>>> enforcement?
>>>
>>> I pushed this before looking at the intervening comments, so the last bit is a bit redundant; it sounds like the restriction on DIArgList was intended and makes sense, so I think the remaining question is whether the Verifier check is sufficient.
>>
>> It seems like `DIArgList` is very similar to LocalAsMetadata. I feel like it'd be less error-prone for it to use/inherit-from `ReplaceableMetadataImpl`, just like ValueAsMetadata does. Then it has the machinery to RAUW/delete itself when necessary.
>>
>> (Moreover, given that it doesn't have any operands, I find it suspicious that it inherits (albeit indirectly) from `MDNode`. It should probably be a direct subclass of `Metadata`. This would mean it couldn't inherit from `DINode` which might complicate some debug info code; if you're going to clean that up I suggest doing it later as a separate patch.)
>
> I looked into deriving `DIArgList` from `ReplaceableMetadataImpl`, and it does get awkward quickly that it is an `MDNode`. It seems like there is a fair amount of code around tracking/RAUW which relies on `MDNode`s being eventually "resolveable". The result is that I don't think I can inherit from `ReplaceableMetadataImpl` without also changing the inheritence from `MDNode` to `Metadata` as you suggest.
>
> I'll look at what effect that has on any debug-info related code, but I do worry that no matter what there will be a fair amount of special-casing.



In D104827#3192965 <https://reviews.llvm.org/D104827#3192965>, @dexonsmith wrote:

> In D104827#3189622 <https://reviews.llvm.org/D104827#3189622>, @scott.linder wrote:
>
>> I'm pushing the (rough) patch which tries to make the switch from inheriting
>> DIArglist from MDNode to ReplaceableMetadataImpl.
>>
>> The biggest issue I see with a change along these lines is that the
>> `!DIArgList` syntax becomes misleading.
>
> I don't see what's misleading about it. `!` is the indicator that "metadata follows", and `DIArgList` is the name of this special kind of metadata.
>
> TBH, I find it odd that there'd be something inheriting from `MDNode` that is as special as this....

It seemed like a useful invariant that the syntax `!<id>` always implied an `MDNode`, in the same way that `!"..."` means `MDString`, although I guess this is just an implementation detail.

It may just be a naming thing, though; we say in the langref that there are only metadata "strings" and metadata "nodes", but we already have metadata "ValueAsMetadata"s too. If we were to hoist up `DIArgList` as well how would we want to describe it? By saying we have: strings, nodes, ValueAsMetadatas, and DIArgLists?

>> If we really want this, I think it
>> implies a bit more work, namely generalizing this to something closer to a
>> `ValuesAsMetadata` (plural), with a syntax closer to:
>>
>>   metadata {i32 1, i64 %x, float %f}
>>
>> I think this doesn't add any ambiguity to the syntax, but this is just my
>> initial guess at what might make sense.
>
> This looks awkwardly similar to generic MDNode syntax (just with the distinction that generic metadata is not allowed to reference local variables). I don't think generic syntax is better than naming the type here. If you still disagree, can you explain why?

That was my intention, because I don't see why this needs to be DI-centric. My guess was wrong either way, though, because `metadata {` already signifies the start of a structure type IIUC.

I was just trying to suggest that all `DIArgList` currently does is stand in for a function-local metadata tuple, so just making one of those and using that seems more straightforward.

>> I'm not sure I can commit to this work, but I would still like to propose making
>> the UNIQUED/DISTINCT change for DIExpression and DICompileUnit. Would it be
>> reasonable for me to just remove DIArgList from my original patch, as it has
>> proven to be special enough not to generalize how I had hoped?
>
> IIUC, `DIExpression` is data-driven, so it's reasonable for it always to be uniqued. Originally it was not going to inherit from MDNode, but IIRC it was useful as an incremental step for the transition from all generic metadata. At some point there was a FIXME in the code to reparent directly under Metadata but probably that was dropped eventually. I still think that'd be a reasonable thing to do.

That `FIXME` is still present, and if that is true then maybe my whole conception of the class hierarchy and what to call various parts is just not lining up with reality. As a smoke test for whether I'm on the same page, would you expect `DIExpression` to no longer be referred to as a "node" in documentation, as it would no longer derive from `MDNode`?

> If `DICompileUnit` already has this property, seems to me like that part could be separated into an NFC change to make it easier to review what's actually changing.

Sure; just to confirm, you mean split this into two patches, one NFC patch adding `HANDLE_MDNODE_LEAF_DISTINCT`, and another adding `HANDLE_MDNODE_LEAF_UNIQUED`?

> Seems reasonable to handle `DIArgList` separately from the other two. It'd be a shame to drop it entirely; it seems like there are some fundamental modelling problems with DIArgList that would be valuable to iron out.

Yes, I agree that at the very least the fact that `DIArgList` can transiently become `distinct`, but then not round-trip this property through IR, probably deserves attention. I am not particularly confident that the way I first though about remedying it really lines up with how metadata is currently designed, but I would be interested in learning enough to contribute a fix.

I'll wait to hear if I follow how you would prefer the patch be split up before I incorporate your other feedback.



================
Comment at: llvm/lib/Bitcode/Reader/MetadataLoader.cpp:2003
 
     IsDistinct = Record[0] & 1;
     uint64_t Version = Record[0] >> 1;
----------------
dexonsmith wrote:
> I think you should replace this line with a comment, maybe something like:
> ```
> lang=c++
> // DIExpression is never distinct. Old bitcode uses the 0-bit for that
> // but it's now ignored.
> ```
> You can drop the assignment to `IsDistinct` entirely; it's not interesting here anymore.
OK, fair enough; so the bit just becomes "don't care" going forward, which naturally means old bitcode is forward compatible


================
Comment at: llvm/lib/CodeGen/MIRParser/MIParser.cpp:1187
   } else if (Token.is(MIToken::md_diexpr)) {
+    // FIXME: This should be driven off of the UNIQUED property in Metadata.def
     if (parseDIExpression(Node))
----------------
dexonsmith wrote:
> I'm not sure what you mean by this comment. What is "this", and why should a metadata kind being uniqued-by-content influence it?
"this" is trying to get at "this code which supports DIExpression appearing inline in MIR"; I can try to fix up the wording some to be more clear

The property I was aiming for with the whole patch was that "inlined" metadata is never distinct. This seemed like a logical "rule" that would explain the lack of syntax for `distinct` nodes inline.


================
Comment at: llvm/lib/CodeGen/MIRParser/MIParser.cpp:2331
   } else if (Token.is(MIToken::md_diexpr)) {
+    // FIXME: This should be driven off of the UNIQUED property in Metadata.def
     if (parseDIExpression(Node))
----------------
dexonsmith wrote:
> I'm not sure what you mean by this comment. What is "this", and why should a metadata kind being uniqued-by-content influence it?
ditto


================
Comment at: llvm/lib/IR/DebugInfoMetadata.cpp:1075
                                     StorageType Storage, bool ShouldCreate) {
+  assert(Storage != Distinct && "DIExpression cannot be distinct");
   DEFINE_GETIMPL_LOOKUP(DIExpression, (Elements));
----------------
dexonsmith wrote:
> Can this be `Storage == Uniqued`?
Yes, I'll make this change.


================
Comment at: llvm/lib/IR/Verifier.cpp:936
   Metadata *MD = MDV.getMetadata();
+
   if (auto *N = dyn_cast<MDNode>(MD)) {
----------------
dexonsmith wrote:
> Nit: if we add a newline here, should be in a separate NFC commit since there's no other change around.
Woops, I just didn't notice this before pushing the diff. I'll just remove this change from the patch.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104827/new/

https://reviews.llvm.org/D104827



More information about the llvm-commits mailing list