[LLVMdev] RFC: Metadata attachments to function definitions

Wed Apr 15 10:06:24 PDT 2015

On Tue, Apr 14, 2015 at 10:59 PM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:

>
> > On 2015 Apr 14, at 21:46, David Blaikie <dblaikie at gmail.com> wrote:
> >
> > On Tue, Apr 14, 2015 at 9:33 PM, Duncan P. N. Exon Smith
> > <dexonsmith at apple.com> wrote:
> >>
> >> `Function` definitions should support `MDNode` attachments, with a
> >> similar syntax to instructions:
> >>
> >>    define void @foo() nounwind !attach !0 {
> >>      unreachable
> >>    }
> >>    !0 = !{}
> >>
> >> Attachments wouldn't be allowed on declarations, just definitions.
> >>
> >> There are two open problems this can help with:
> >>
> >> 1. For PGO, we need somewhere to attach the function entry count.
> >>    Attaching to the function definition is a simple solution.
> >
> > No comment - can't say I know anything about that.
> >
> >>
> >>        define void @foo() !prof !0 {
> >>          unreachable
> >>        }
> >>        !0 = !{i32 987}
> >>
> >> 2. In debug info, we repeatedly build up a map from `Function` to the
> >>    canonical `MDSubrogram` for it.
> >
> > Sounds great - I'd imagine this working somewhat like the way I've
> > made implicit special members & other non-standard members of class
> > types work in the debug info metadata, which is to say that the
> > children reference the parent, but the parent doesn't reference the
> > children (in this case, that would mean things like comdat folding in
> > LTO would 'just work' - whichever function we picked would keep its
> > debug info and attachment to its CU - the other CU would just appear
> > not to have that function anymore - we might need a special case for
> > comdat folding where one copy has debug info and another copy doesn't,
> > make sure we move the metadata over if we're picking one without debug
> > info).
> >
> > Though that will mean that whenever we want to walk all the
> > subprograms of a CU, we'd have to build it by walking all the
> > Functions in a module - it might be worth checking if/when/where we
> > "need" to do that - I suspect it's rare and maybe can be made entirely
> > unnecessary.
>
> I *think* we only do this once, and it's (ironically) to create the
> "Subprogram -> CompileUnit" map.
>

Right, I thought that might be the case.

> That highlights a problem that my proposal doesn't solve on its own.
> While this proposal creates a "Function -> Subprogram" map, there still
> isn't a "Subprogram -> CompileUnit" map -- that would still (for now)
> be stored implicitly via `MDCompileUnit::getSubprograms()`.
>

I guess this is (also?) what I was thinking about.

> Why isn't this map encoded in the `scope:` chain?  Because many of the
> `scope:` chains currently terminate at `MDFile`s or `null` instead of
> `MDCompileUnit`s.  Why?  Because LTO type uniquing needs scope chains
> to be identical.

Ah, right.

(side note: sometimes need to end in MDFile or we might need an equivalent
of DILexicalBlockFile for the CU - to switch files within the same CU
(things defined in headers, etc))

>   (I have a vague plan for fixing this, too: (1) move
> ownership of Metadata to the Module (so metadata isn't leaked by
> `lto_module_dispose()`), (2) add a "StringRef -> MDType" map to the
> Module (only for types with an ODR-style UUID), (3) delete the concept
> of `MDString`-based type refs and update lib/Linker to rely on the
> "StringRef -> MDType" map in the destination Module, (4) make all
> `scope:` chains terminate at an `MDCompileUnit` and drop "scope"-ness
> of `MDFile`, and (5) finally drop the `subprograms:` field from
> `MDCompileUnit`.  But I'm not confident about step 4 yet.)
>

Sounds plausible.

(side note: any plans to do the scope-based textual IR that was thrown
around during the prototype stages? It'd be another readability (&
writability) improvement to not have to manually walk all the scope chains.

Anyway, maybe after most/all the functionality improvements are made we
could do a maintainability pass & see whether we could get to a practically
writable/readable format... I'm still not sure how likely that is, given
the fact that debug info necessarily /describes/ the source the user wrote,
so it's always going to be more verbose, but maybe it's achievable)

>
> >
> >> Keeping this mapping accurate takes
> >>    subtle logic in `lib/Linker` (see PR21910/PR22792) and it's
> >>    expensive to compute and maintain.  Attaching it directly to the
> >>    `Function` designs away the problem.
> >>
> >>        define void @foo() !dbg !0 {
> >>          unreachable
> >>        }
> >>        !0 = !MDSubprogram(name: "foo", function: void ()* @foo)
> >>
> >> Thoughts?
> >>
> >> Moving onto implementation, I'd provide the same generic API that
> >> `Instruction` has, and wouldn't bother with the "fast path" API for
> >> `!dbg`.  Moreover, the generic path wouldn't be slow.  Since there are
> >> fewer functions than instructions, we can afford to store the
> >> attachments directly on the `Function` instead of off in the
> >> `LLVMContext`.
> >>
> >> It's not clear to me just how precious memory is in `Function`; IIRC
> >> it's sitting at 168B right now for x86-64.  IMO, a `SmallVector<..., 1>`
> >> -- cost of 64B -- seems fine.  I'll start with this if I don't hear any
> >> objections; we can optimize later if necessary.
> >>
> >> Otherwise, I could hack together a custom vector-like object with the
> >> "small string" optimization.
> >
> > Not important, but I'm missing something: what're you picturing that
> > would be different from/better than SmallVector?
> >
>
> Data storage would be:
>
>     struct Data {
>       struct value_type {
>         unsigned Tag;
>         TrackingMDNodeRef MD;
>       };
>
>       unsigned Capacity;
>       union {
>         unsigned LargeSize;
>         unsigned SmallTag;
>       } Unsigned;
>
>       AlignedCharArrayUnion<
>           value_type *,     // LargeArray
>           TrackingMDNodeRef // SmallMD
>         > Pointer;
>     };
>     static_assert(sizeof(Data) == sizeof(void *) + sizeof(unsigned) * 2,
>                   "Wrong size");
>
> Two advantages over `SmallVector<value_type, 1>`:
>
>   - 32-bits each for size and capacity (instead of 64-bits).
>   - Domain knowledge of `value_type` allows aggressive unions.
>
> Can't provide as much API as `std::vector<>`, but the API for metadata
> attachments is small and opaque:
>
>     bool hasMetadata(unsigned Tag) const;
>     MDNode *getMetadata(unsigned Tag) const;
>     void setMetadata(unsigned Tag, const MDNode *MD);
>     void getAllMetadata(
>         SmallVectorImpl<std::pair<unsigned, const MDNode *>> &MDs) const;
>
> >>  Cost would be 16B per `Function`, with the
> >> same malloc/heap costs as `SmallVector<..., 1>`.  But I haven't seen a
> >> profile that suggests this would be worth the complexity...
> >>
> >> Any opinions?
> >>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150415/ab9af7c7/attachment.html>