[LLVMdev] [RFC] Separating Metadata from the Value hierarchy

Duncan P. N. Exon Smith dexonsmith at apple.com
Mon Nov 10 10:17:43 PST 2014


> On 2014-Nov-10, at 08:30, Chris Lattner <clattner at apple.com> wrote:
> 
>> On Nov 9, 2014, at 5:02 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
>> In response to my recent commits (e.g., [1]) that changed API from
>> `MDNode` to `Value`, Eric had a really interesting idea [2] -- split
>> metadata entirely from the `Value` hierarchy, and drop general support
>> for metadata RAUW.
> 
> Wow, this never occurred to me, but in hindsight seems like obviously the right direction.

Yup, good on Eric here.

>> Here is what we'd lose:
>> 
>> 1. No more arbitrary RAUW of metadata.
>> 
>>   While we'd keep support for RAUW of temporary MDNodes for use as
>>   forward declarations (when parsing assembly or constructing cyclic
>>   graphs), drop RAUW support for all other metadata.
> 
> This seems fine to me, a corollary of this should be that MDNodes never “move around” in memory due to late uniquing etc, which means that TrackingVH shouldn’t be necessary, right?  This should make all frontends a lot more memory efficient because they can just use raw pointers to MDNodes everywhere.

Almost.

Two caveats:

 1. Handles should generally be `MDRef` instead of `Metadata*`, since I
    threw in reference-counting semantics so that no-longer-referenced
    metadata cleans itself up.

 2. If a handle might point to a forward reference -- i.e., a
    `TempMDNode` in this patch -- it should use `TrackingMDRef`.  When
    the forward reference gets resolved, it updates all of its tracking
    references.

Nevertheless, `sizeof(MDRef) == sizeof(TrackingMDRef) == sizeof(void*)`.
Frontends *only* pay memory overhead for the currently-unresolved
forward references.

(Maybe `MDNodeFwdRef` is a better name than `TempMDNode`?)

> 
>> 2. No more function-local metadata.
> 
> This was a great idea that never mattered, I’m happy to drop it for the greater good :-)
> 

Awesome.

> +class Metadata {
> +private:
> +  LLVMContext &Context;
> 
> Out of curiosity, why do Metadata nodes all need to carry around a context?  This seems like memory bloat that would be great to avoid if possible.

There are two uses for the context:

  - RAUW.  Metadata that can RAUW (`TempMDNode` and `MetadataAsValue`)
    need a context.  Other metadata need access to a context when their
    operands get RAUW'ed (so they can re-unique themselves), but this
    could be passed in as an argument, so they don't need their own.
    
  - Reference counting.  My sketch uses reference counting for all the
    nodes, so they all need a context to delete themselves when their
    last reference gets dropped.

Customized ownership of debug info metadata will allow us to get the
context from parent pointers, so we might be able to remove this down
the road (depending on whether the `Metadata` subclass is uniqued).

> +template <class T> class TypedMDRef : public MDRef {
> 
> Is this a safe way to model this, should it use private inheritance instead?  Covariance seems like it would break this: specifically something could pass off a typed MDRef as an MDRef&, then the client could reassign something of the wrong type into it.

Good point.

> +  MDString(LLVMContext *Context) : Metadata(*Context, MDStringKind) {
> +    assert(Context && "Expected non-null context");
> ...
> +  static MDStringRef get(LLVMContext &Context, StringRef String);
> 
> You have reference/pointer disagreement, I’d recommend taking LLVMContext& to the ctor for uniformity (and remove the assert).

This is a workaround for `StringMapEntry`'s imperfect forwarding --
it passes the `InitVal` constructor argument by value.

I'll fix the problem at its source and clean this up.

> +class ReplaceableMetadataImpl {
> 
> I’m not following your algorithm intentions well yet, but this seems like a really heavy-weight implementation, given that this is a common base class for a number of other important things.

Unfortunately I think this weight will be hard to optimize out (although
I'm sure it could be a little smaller).

In the `Value` hierarchy, you pay memory for RAUW at the site of each
`Use`.  This makes sense, since most values can be RAUW'ed.

Since very little metadata needs to be RAUW'ed, we don't want to pay
memory at every use-site.  Instead, we pay the cost inside the
(hopefully) few instances that support RAUW.

The core assumption is that there are relatively few replaceable
metadata instances.  The only subclasses are `ValueAsMetadata` and
`TempMDNode`.  How many of these are live?

  - There will be at most one `ValueAsMetadata` instance for each
    metadata operand that points at a `Value`.  My data from a couple of
    months ago showed that there are very few of these.

  - `TempMDNodes` are used as forward references.  You only pay their
    cost until the forward reference gets resolved (i.e., deleted).

The main case I'm worried about here are references to `ConstantInt`s,
which are used pretty heavily outside of debug info (e.g., in `!prof`
attachments).  However, if that shows up in a profile, we can bypass
`Value`s entirely by creating a new `MDIntArray` subclass (or
something).

> +class ValueAsMetadata : public Metadata, ReplaceableMetadataImpl {
> 
> I think that ValueAsMetadata is a great thing to have - it is essential to refer to global variables etc.  That said, I think that it will eventually be worthwhile to have metadata versions of integers and other common things to reduce memory.  This should come in a later pass of course.

Yup, agree entirely.

> 
> I don’t follow the point of
> +class UniquableMDNode : public MDNode {
> 
> What would subclass it?  What is the use-case?  Won’t MDStrings continue to be uniqued?

I think you've just been misled by my terrible name.

This sketch splits the "temporary" concept for `MDNode` into a separate
subclass, so that non-forward-reference `MDNode`s don't have to pay for
RAUW overhead.

The class hierarchy I envision looks something like this:

    Metadata
      MDNode
        TempMDNode      // MDNodeFwdRef?
        UniquableMDNode // GenericMDNode?
        DINode          // Is this layer useful?
          DILocation
          DIScope
            DIType
              DIBasicType
              DICompositeType
              ...
            DISubprogram
            ...
          DICompileUnit
          ...
      MDString
      ValueAsMetadata     

`UniquableMDNode` is a leaf-class that behaves like the current `MDNode`
(when it's not a temporary).  I called it "uniquable" because, unlike
`TempMDNode`, it is uniqued by default (although you can opt-out of it,
and uniquing might be dropped).

Maybe a better name is `GenericMDNode`?

Also, off-topic, but after sketching out the imagined hierarchy above,
I'm not sure `DINode` is particularly useful.



More information about the llvm-dev mailing list