[LLVMdev] RFC: Metadata attachments to function definitions

Mon Apr 20 10:48:02 PDT 2015

On Mon, Apr 20, 2015 at 10:44 AM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:

>
> > On 2015-Apr-17, at 19:59, David Blaikie <dblaikie at gmail.com> wrote:
> >
> > On Fri, Apr 17, 2015 at 7:15 PM, Duncan P. N. Exon Smith
> > <dexonsmith at apple.com> wrote:
> >>
> >>> On 2015 Apr 15, at 10:06, David Blaikie <dblaikie at gmail.com> wrote:
> >>>
> >>>
> >>>
> >>> On Tue, Apr 14, 2015 at 10:59 PM, Duncan P. N. Exon Smith <
> dexonsmith at apple.com> wrote:
> >>>
> >>>> On 2015 Apr 14, at 21:46, David Blaikie <dblaikie at gmail.com> wrote:
> >>>>
> >>>> On Tue, Apr 14, 2015 at 9:33 PM, Duncan P. N. Exon Smith
> >>>> <dexonsmith at apple.com> wrote:
> >>>>>
> >>>>> `Function` definitions should support `MDNode` attachments, with a
> >>>>> similar syntax to instructions:
> >>>>>
> >>>>>   define void @foo() nounwind !attach !0 {
> >>>>>     unreachable
> >>>>>   }
> >>>>>   !0 = !{}
> >>>>>
> >>>>> Attachments wouldn't be allowed on declarations, just definitions.
> >>>>>
> >>>>> There are two open problems this can help with:
> >>>>>
> >>>>> 1. For PGO, we need somewhere to attach the function entry count.
> >>>>>   Attaching to the function definition is a simple solution.
> >>>>
> >>>> No comment - can't say I know anything about that.
> >>>>
> >>>>>
> >>>>>       define void @foo() !prof !0 {
> >>>>>         unreachable
> >>>>>       }
> >>>>>       !0 = !{i32 987}
> >>>>>
> >>>>> 2. In debug info, we repeatedly build up a map from `Function` to the
> >>>>>   canonical `MDSubrogram` for it.
> >>>>
> >>>> Sounds great - I'd imagine this working somewhat like the way I've
> >>>> made implicit special members & other non-standard members of class
> >>>> types work in the debug info metadata, which is to say that the
> >>>> children reference the parent, but the parent doesn't reference the
> >>>> children (in this case, that would mean things like comdat folding in
> >>>> LTO would 'just work' - whichever function we picked would keep its
> >>>> debug info and attachment to its CU - the other CU would just appear
> >>>> not to have that function anymore - we might need a special case for
> >>>> comdat folding where one copy has debug info and another copy doesn't,
> >>>> make sure we move the metadata over if we're picking one without debug
> >>>> info).
> >>>>
> >>>> Though that will mean that whenever we want to walk all the
> >>>> subprograms of a CU, we'd have to build it by walking all the
> >>>> Functions in a module - it might be worth checking if/when/where we
> >>>> "need" to do that - I suspect it's rare and maybe can be made entirely
> >>>> unnecessary.
> >>>
> >>> I *think* we only do this once, and it's (ironically) to create the
> >>> "Subprogram -> CompileUnit" map.
> >>>
> >>> Right, I thought that might be the case.
> >>>
> >>> That highlights a problem that my proposal doesn't solve on its own.
> >>> While this proposal creates a "Function -> Subprogram" map, there still
> >>> isn't a "Subprogram -> CompileUnit" map -- that would still (for now)
> >>> be stored implicitly via `MDCompileUnit::getSubprograms()`.
> >>>
> >>> I guess this is (also?) what I was thinking about.
> >>>
> >>> Why isn't this map encoded in the `scope:` chain?  Because many of the
> >>> `scope:` chains currently terminate at `MDFile`s or `null` instead of
> >>> `MDCompileUnit`s.  Why?  Because LTO type uniquing needs scope chains
> >>> to be identical.
> >>>
> >>> Ah, right.
> >>>
> >>> (side note: sometimes need to end in MDFile or we might need an
> equivalent of DILexicalBlockFile for the CU - to switch files within the
> same CU (things defined in headers, etc))
> >>
> >> Ah, okay.  I thought we could just replace them with pointers to the
> >> compile unit.  Something like `DIFileScope` with `scope:` and
> >> `file:` fields would probably work?  (Which means I shouldn't have
> >> merged the two types of "file" nodes when I introduced the new
> >> hierarchy.  Boo.)
> >
> > I'd have to double-check - maybe this is a non-issue & I've
> > misremembered/misunderstood things here (DILexicalBlockFile is needed
> > because we don't put file info on the DebugLocs to keep them small -
> > maybe the MDSubprograms (& MDNamespace, etc) should/could/(already
> > does?) just have a file attribute directly?)
>
> Yes, they do already have a file attribute.  Is that sufficient?
>

Should be, I'd imagine.

>
> >
> >>
> >>>
> >>>  (I have a vague plan for fixing this, too: (1) move
> >>> ownership of Metadata to the Module (so metadata isn't leaked by
> >>> `lto_module_dispose()`), (2) add a "StringRef -> MDType" map to the
> >>> Module (only for types with an ODR-style UUID), (3) delete the concept
> >>> of `MDString`-based type refs and update lib/Linker to rely on the
> >>> "StringRef -> MDType" map in the destination Module, (4) make all
> >>> `scope:` chains terminate at an `MDCompileUnit` and drop "scope"-ness
> >>> of `MDFile`, and (5) finally drop the `subprograms:` field from
> >>> `MDCompileUnit`.  But I'm not confident about step 4 yet.)
> >>>
> >>> Sounds plausible.
> >>>
> >>> (side note: any plans to do the scope-based textual IR that was thrown
> around during the prototype stages? It'd be another readability (&
> writability) improvement to not have to manually walk all the scope chains.
> >>
> >> Vague plans, but yes.  Doing it the way I want is blocked on
> >> removing type refs (maybe among other things?).
> >>
> >>> Anyway, maybe after most/all the functionality improvements are made
> we could do a maintainability pass & see whether we could get to a
> practically writable/readable format... I'm still not sure how likely that
> is, given the fact that debug info necessarily /describes/ the source the
> user wrote, so it's always going to be more verbose, but maybe it's
> achievable)
> >>
> >> Yeah, I think this is a great idea.
> >>
> >>>
> >>>
> >>>>
> >>>>> Keeping this mapping accurate takes
> >>>>>   subtle logic in `lib/Linker` (see PR21910/PR22792) and it's
> >>>>>   expensive to compute and maintain.  Attaching it directly to the
> >>>>>   `Function` designs away the problem.
> >>>>>
> >>>>>       define void @foo() !dbg !0 {
> >>>>>         unreachable
> >>>>>       }
> >>>>>       !0 = !MDSubprogram(name: "foo", function: void ()* @foo)
> >>>>>
> >>>>> Thoughts?
> >>>>>
> >>>>> Moving onto implementation, I'd provide the same generic API that
> >>>>> `Instruction` has, and wouldn't bother with the "fast path" API for
> >>>>> `!dbg`.  Moreover, the generic path wouldn't be slow.  Since there
> are
> >>>>> fewer functions than instructions, we can afford to store the
> >>>>> attachments directly on the `Function` instead of off in the
> >>>>> `LLVMContext`.
> >>>>>
> >>>>> It's not clear to me just how precious memory is in `Function`; IIRC
> >>>>> it's sitting at 168B right now for x86-64.  IMO, a `SmallVector<...,
> 1>`
> >>>>> -- cost of 64B -- seems fine.  I'll start with this if I don't hear
> any
> >>>>> objections; we can optimize later if necessary.
> >>>>>
> >>>>> Otherwise, I could hack together a custom vector-like object with the
> >>>>> "small string" optimization.
> >>>>
> >>>> Not important, but I'm missing something: what're you picturing that
> >>>> would be different from/better than SmallVector?
> >>>>
> >>>
> >>> Data storage would be:
> >>>
> >>>    struct Data {
> >>>      struct value_type {
> >>>        unsigned Tag;
> >>>        TrackingMDNodeRef MD;
> >>>      };
> >>>
> >>>      unsigned Capacity;
> >>>      union {
> >>>        unsigned LargeSize;
> >>>        unsigned SmallTag;
> >>>      } Unsigned;
> >>>
> >>>      AlignedCharArrayUnion<
> >>>          value_type *,     // LargeArray
> >>>          TrackingMDNodeRef // SmallMD
> >>>> Pointer;
> >>>    };
> >>>    static_assert(sizeof(Data) == sizeof(void *) + sizeof(unsigned) * 2,
> >>>                  "Wrong size");
> >>>
> >>> Two advantages over `SmallVector<value_type, 1>`:
> >>>
> >>>  - 32-bits each for size and capacity (instead of 64-bits).
> >>>  - Domain knowledge of `value_type` allows aggressive unions.
> >>>
> >>> Can't provide as much API as `std::vector<>`, but the API for metadata
> >>> attachments is small and opaque:
> >>>
> >>>    bool hasMetadata(unsigned Tag) const;
> >>>    MDNode *getMetadata(unsigned Tag) const;
> >>>    void setMetadata(unsigned Tag, const MDNode *MD);
> >>>    void getAllMetadata(
> >>>        SmallVectorImpl<std::pair<unsigned, const MDNode *>> &MDs)
> const;
> >>>
> >>>>> Cost would be 16B per `Function`, with the
> >>>>> same malloc/heap costs as `SmallVector<..., 1>`.  But I haven't seen
> a
> >>>>> profile that suggests this would be worth the complexity...
> >>>>>
> >>>>> Any opinions?
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150420/8115d1bf/attachment.html>