[PATCH] D104827: [DebugInfo] Enforce implicit constraints on `distinct` MDNodes
Duncan P. N. Exon Smith via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Dec 14 18:55:56 PST 2021
dexonsmith added a comment.
In D104827#3193518 <https://reviews.llvm.org/D104827#3193518>, @scott.linder wrote:
> I was just trying to suggest that all `DIArgList` currently does is stand in for a function-local metadata tuple, so just making one of those and using that seems more straightforward.
Ah, that's an interesting point; thanks for clarifying!
Stepping back a bit...
Before the big metadata / debug info refactor, there was support for metadata nodes owned by a function. These nodes were allowed to point at function-local values. The only uses of these metadata nodes were for direct reference from debug info intrinsics. I removed them entirely to simplify the model, and left behind `LocalAsMetadata`.
I've since realized that much of this "global" metadata is really tied to a specific function. Three examples:
- Inlined `DILocation`s were made `distinct` in order to defeat uniquing across different function definitions.
- `DISubprogram` have become function attachments; and it's illegal to have the same one attached in multiple places. It's effectively owned by the function; and it needs to be cloned when a function is cloned.
- Outside of debug info, there are optimization-related metadata that describe function-local constructs, and that defeat uniquing by having a "root" node that self-references and/or is marked `distinct`. (IIRC, loop metadata is an example?)
I think we should bring back local metadata; moreover, there really seem to be three scopes for metadata:
- Context: pure / immutable constants; cannot change / cannot be `distinct`. Only reference other "context" constants (any referenced LLVM values are also pure constants (`i32 0` okay, but `i32* @global` is not)). Should be shared across all users of an LLVMContext and owned by the LLVMContext. Since they are immutable, never "cloned" by cloneFunctionInto / cloneModule / related APIs.
- Module: module-specific. May have identity and be mutable (`distinct`). If it is uniqued / "constant", then it is at "module" scope because it (transitively) references something else with "module" scope (e.g., a global value or `distinct` metadata). Can reference LLVM global values but not local values. These are effectively owned by a specific Module. Cloning a module would require making new copies of these, but cloning a function would not.
- Local: function-specific. May have identity and be mutable (`distinct`). If it is uniqued / "constant", then it is at "local" scope because it (transitively) references something else with "local" scope. Cloning a function would require making new copies of these. Effectively owned by the function/global value that it's associated with.
In this model, it seems like DIArgList falls under "local"; and you're suggesting it's just a stand-in for not having a local "MDNode", and not really semantically tied to debug info or argument lists. If that's the case, then it seems they're only required to be inline right now because there's no syntax to describe them otherwise. Syntax could look like:
define void @f(i32 %a, i32 %b) {
call @llvm.some.intrinsic(!%1)
; Some local metadata. Can reference LocalAsMetadata.
; Cloned when the function is cloned.
!%1 = metadata !{i32 %a, i32 %b, !%2}
!%2 = distinct metadata !{!1}
}
; Global metadata. Cloned when a module is cloned.
!1 = distinct metadata !{!2, ptr @f}
!2 = metadata !{!1, !3}
; Context metadata. Pure constants.
!3 = metadata !{!4, !5}
!4 = metadata !{i32 1, ptr null}
!5 = metadata !{}
Stepping back again, I think there are three kinds of identity for metadata:
- Distinct: not in any uniquing table (e.g., `MDNode` marked `distinct`)
- Uniqued (always): in a uniquing table; either immutable (e.g., "Context" scope / `DIExpression`) or has permanent RAUW support to handle changes (e.g., `ValueAsMetadata`)
- Uniqued (best-effort): findable via uniquing tables, but operands might change and lacks permanent RAUW support; degrades to "distinct" if an operand changes (e.g., `MDNode` without `distinct`)
The current design for metadata optimizes for having most metadata either "distinct" or "best-effort-uniqued". Permanent RAUW support is expensive (in this design).
Our recent discussions about DIArgList point toward making it always-uniqued; since the operands can change, this requires permanent RAUW support. I wonder if, instead, best-effort-uniqued would be sufficient... if only there were syntax for it. WDYT?
> Sure; just to confirm, you mean split this into two patches, one NFC patch adding `HANDLE_MDNODE_LEAF_DISTINCT`, and another adding `HANDLE_MDNODE_LEAF_UNIQUED`?
Yes.
For the former, maybe "ALWAYS_DISTINCT" would be more clear? Either way, this seems like a well-defined concept.
For the latter, I'm not quite convinced yet. The name doesn't seem right, since most nodes are stored/de-duped in uniquing tables (maybe "ALWAYS_UNIQUED" would work?). But:
- I don't think "always-uniqued" should imply things like "printed inline in textual IR"
- I'm not sure should have MDNodes that have permanent RAUW support (since it's expensive)
- Can we get away with a "local" MDNode that is best-effort-uniqued?
================
Comment at: llvm/lib/CodeGen/MIRParser/MIParser.cpp:1187
} else if (Token.is(MIToken::md_diexpr)) {
+ // FIXME: This should be driven off of the UNIQUED property in Metadata.def
if (parseDIExpression(Node))
----------------
scott.linder wrote:
> dexonsmith wrote:
> > I'm not sure what you mean by this comment. What is "this", and why should a metadata kind being uniqued-by-content influence it?
> "this" is trying to get at "this code which supports DIExpression appearing inline in MIR"; I can try to fix up the wording some to be more clear
>
> The property I was aiming for with the whole patch was that "inlined" metadata is never distinct. This seemed like a logical "rule" that would explain the lack of syntax for `distinct` nodes inline.
We could support all nodes here for parsing, if we think that's useful... I'm not sure if there's a good reason to only have DIExpression here...
> that would explain the lack of syntax for `distinct` nodes inline.
IIRC, the historical lack of syntax (in `.ll` at least, not sure about `.mir`) is because it didn't seem useful/important. When MDNodes are distinct, it is because they either need identity *a priori* (in which case, probably there are multiple references to them and they MUST be out-of-line) or there was a uniquing table collision for best-effort-uniqued metadata (expected to be rare). Anything that was always-uniqued had first class syntax.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D104827/new/
https://reviews.llvm.org/D104827
More information about the llvm-commits
mailing list