[LLVMdev] RFC: Reduce the memory footprint of DIEs (and DIEValues)

Thu Jun 25 13:13:31 PDT 2015

On Thu, Jun 25, 2015 at 1:01 PM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:

>
> > On 2015-Jun-25, at 12:44, David Blaikie <dblaikie at gmail.com> wrote:
> >
> >
> >
> > On Thu, Jun 25, 2015 at 12:04 PM, Duncan P. N. Exon Smith <
> dexonsmith at apple.com> wrote:
> >
> > > On 2015-Jun-25, at 12:00, David Blaikie <dblaikie at gmail.com> wrote:
> > >
> > >
> > >
> > > On Thu, Jun 25, 2015 at 11:49 AM, Duncan P. N. Exon Smith <
> dexonsmith at apple.com> wrote:
> > >
> > > > On 2015-Jun-25, at 10:48, David Blaikie <dblaikie at gmail.com> wrote:
> > > >
> > > > I'm not sure of the merits of handing around unique_ptrs with a null
> deleter (I could see it in some generic template machinery - but this is
> pretty explicitly never-owning). I get the desire to tag the conceptual
> ownership, and I appreciate that - but without any checking I suspect it'll
> just become incorrect over time and we won't notice.
> > >
> > > IMO, where tagging the ownership is valuable is before the DIE is
> > > inserted in the tree (and in this patch that's the only place it
> > > remains).  But maybe `unique_ptr` is kind of a funny name for what's
> > > going on.
> > >
> > > Maybe a way to keep it from bitrotting would be to make some of the
> > > API private, something like:
> > >
> > >     class OwnedDIE {
> > >       friend class DIE;
> > >       DIE *Die = nullptr;
> > >
> > >       OwnedDIE(const OwnedDIE &) = delete;
> > >       OwnedDIE &operator=(const OwnedDIE &) = delete;
> > >
> > >       OwnedDIE(DIE *Die) : Die(Die) {}
> > >       DIE *release() {
> > >         auto *D = Die;
> > >         Die = nullptr;
> > >         return D;
> > >       }
> > >
> > >     public:
> > >       OwnedDIE() = default;
> > >       ~OwnedDIE() = default;
> > >       OwnedDIE(OwnedDIE &X) : Die(X.Die) { X.Die = nullptr; }
> > >       OwnedDIE &operator=(OwnedDIE &X) {
> > >         Die = X.Die;
> > >         X.Die = nullptr;
> > >         return *this;
> > >       }
> > >       DIE *get() const { return Die; }
> > >
> > > As soon as you've got 'get()' (yeah, op* could be abused, but that's a
> bit more expliict work to take the address of the reference and go stash
> that somewhere) I think it becomes pretty easy to accidentally duplicate or
> otherwise play fast-and-loose with ownership.
> >
> > FWIW, it's impossible to play fast and loose between `DIE::get()` and
> > `DIE::addChild()` with this API.  You can only mess around with
> > top-level DIEs.  I'm not sure that's worth much though?
> >
> > I guess the only thing preventing you from adding the same DIE as a
> child to multiple nodes is the parental check? So the parent pointer acts
> as a sort of intrusive ownership tracking mechanism... :s doesn't exactly
> fill me with confidence.
>
> Right.  (Unless we keep something like an `OwnedDIE`.)
>

*nod* but without checking, still easy to abuse an OwnedDIE.

Anyway, rather than going around in circles here: Without any checks
(except the incidental one of the parent pointer) it feels misleading to
talk about ownership. I would suggest/prefer abandoning the concept and
thinking about it as the BumpPtrAllocator owning these allocations.

But I leave that up to you - keep it the way it is if you prefer.

> > That said, since they're bump pointer allocated anyway, there's no
> reason we couldn't have a multi-parented DIE anyway. (I mean it'd be silly
> because it'd print lots of redundant DWARF, but... *shrug*)
>
> Well, the `DIE::Offset` is stored intrusively in a `DIE`, so this
> isn't quite legal.
>

Ah, right right.

> Also, `DIEHash::computeTypeSignature()` requires walking the parent
> list -- otherwise, we could almost get rid of it (there are a
> couple of other users, but IIRC they're not as hard to refactor).
>

Hmm - I thought one of them was the stuff used to do cross-DIE referencing
which was pretty hard or ugly to do in other ways, if I remember the rather
long review that went on for that change...

But if it's for types in particular - we only produce type signatures when
producing the CU signature (we don't use type signatures for type units)
and so we can probably keep track of the parent chain during the hash walk
and re-use it rather than walking up it.

>
> >
> > >       DIE &operator*() const { return *Die; }
> > >       DIE *operator->() const { return Die; }
> > >       explicit operator bool() const { return Die; }
> > >     };
> > >
> > >     class DIE {
> > >     public:
> > >       // Calls OwnedDIE::OwnedDIE(DIE *).
> > >       static OwnedDIE get(BumpPtrAllocator &Alloc, dwarf::Tag Tag);
> > >       // Calls OwnedDIE::release().
> > >       DIE &addChild(OwnedDIE Child);
> > >     };
> > >
> > > I guess I'm not sure how much better this is, although it would
> > > prevent llvm-dsymutil from compiling without some more
> > > reorganization (currently it calls `release()` and recreates
> > > `unique_ptr<>` between functions).
> > >
> > > > If we're going to keep it - could we add some checking in debug
> builds somehow? (skip the bump ptr allocation entirely in debug build? use
> a typed bump ptr allocator (it calls dtors) in a debug build? Just a side
> table hash set to check construction/destruction pairs and an assert that
> it's empty on exit?)
> > >
> > > The `OwnedDIE` idea above adds a bit more safety, but it wouldn't
> > > catch "leaks", and a consumer (like dsymutil) and a consumer could
> > > choose to store top-level DIEs outside an `OwnedDIE` container.
> > >
> > > Thoughts?
> > >
> > > Here's a claim I'll try on for size:
> > >
> > > Either we should have a mechanism that enforces ownership-like
> semantics (which I think boils down to: checked pseudo-destruction in
> asserts builds) or we probably should just forget about it and treat these
> things as raw pointers with no specific sense of ownership. Just accept
> that they were all created with the BumpPtrAllocator (perhaps make the API
> restrictions stronger in this regard? delete op new/delete? I forget how
> all that works) and move on.
> >
> > Hmm, I guess we already check the `Parent` pointer when adding
> > children, so maybe you're right.
> >
> > >
> > > > Was there a reason to squash the DIEValue and DIE allocator changes
> together? (probably related to the DIEBlock? (both a value and a DIE) - I
> thought I remembered reading something from you about changing/splitting
> that? ah, that's your other patch - if that came first then we could split
> the allocation changes for DIEValue and DIE into separate patches? But
> that's too much work to rejig it all?)
> > >
> > > I think I can split the DIEValue/DIE changes.  New patches early
> > > afternoon I hope.
> > >
> > > >
> > > > in DwarfLinker::patchLineTableForUnit why the change back to a
> range-for from std::find_if?
> > >
> > > I don't remember =/.  I'll see if I can get the original flow working,
> > > and if I can't I'll know why I changed it.
> > >
> > > >
> > > > (I do get tired of seeing allocators passed in/around everywhere -
> would it make more sense to wrap the allocator in a type with utility
> functions for generating things? I suppose it's no great win, but feels a
> bit better, maybe? eg: "DIE::get(DIEValueAllocator,
> dwarf::DW_TAG_inlined_subroutine);" ->
> "DIEBuilder.get(dwarf::DW_TAG_inlined_subroutine)" or similar - for values
> it's trickier, the DIE and the allocator are both needed every time, but
> there's no easy access to the allocator from values or dies... *shrug* )
> > >
> > > Maybe I'll play with some builder ideas, see if it reduces the churn
> > > at all?  We could have:
> > >
> > >     DIEBuilder DieB(Alloc);
> > >     auto Die = DieB.get(Tag);
> > >     DIEValueBuilder DieValueB = DieB.addValues(Die);
> > >     DieValueB.addValue(SomeDieValue);
> > >
> > > I'm not sure this will actually be better in practice though...  I'll
> > > play around after posting new patches.
> > >
> > > Yeah - don't worry too much if it doesn't look useful. And it can be
> orthogonal cleanup if we one day discover a great API. But I appreciate
> giving it a bit of thought.
> > >
> > > - David
> > >
> > >
> > > > On Wed, Jun 24, 2015 at 1:11 PM, Duncan P. N. Exon Smith <
> dexonsmith at apple.com> wrote:
> > > >
> > > > > On 2015 Jun 17, at 11:36, David Blaikie <dblaikie at gmail.com>
> wrote:
> > > > >
> > > > >> On Wed, Jun 17, 2015 at 11:26 AM, Duncan P. N. Exon Smith <
> dexonsmith at apple.com> wrote:
> > > > >>
> > > > >> > On 2015 Jun 16, at 21:40, Yaron Keren <yaron.keren at gmail.com>
> wrote:
> > > > >> >
> > > > >> > Hi Duncan,
> > > > >> >
> > > > >> > Singly linked lists are used throught LLVM: ArrayRecycler,
> Registry, LiveInterval, SCEVUnknown, CGBlockInfo, MacroArgCache,
> ...DIEValueList.
> > > > >> >
> > > > >> > What do you think about implementing a singly-linked ADT?
> > > > >> >
> > > > >>
> > > > >> Interesting idea.
> > > > >>
> > > > >> I've taken a look at the other singly-linked lists you
> mentioned.  Most
> > > > >> of these are some version of a free/recycling list (ArrayRecycler,
> > > > >> Registry, MacroArgCache, etc.), which need super simple push/pop
> API and
> > > > >> no memory management.
> > > > >>
> > > > >> LiveInterval and CGBlock also need iteration and a way to
> conditionally
> > > > >> remove elements (something like `remove_if()`), and in a few
> cases it
> > > > >> would be convenient to have API to destruct and/or destroy nodes.
> > > > >>
> > > > >> I think these would all be straightforward to formalize into an
> ADT.  We
> > > > >> could even use it to implement a value-based
> `llvm::forward_list<>` that
> > > > >> avoided the sorting bug in libstdc++ (which gets "stable" exactly
> > > > >> backwards) that David hit when refactoring tablegen.
> > > > >>
> > > > >> Unfortunately, I think it'd be hard to reuse for DIE::Children and
> > > > >> DIEValueList.  The latter wants to preserve insertion order and
> needs
> > > > >> `push_back()` API (the former can technically get away with a
> timely
> > > > >> call to `reverse()`, but it's awkward).  Supporting `push_back()`
> > > > >> efficiently requires pointing at the last element somehow, either
> via an
> > > > >> extra pointer or circular logic (what my latest patch does for
> > > > >> DIE::Children).
> > > > >
> > > > > One of the other problems I hit when trying to value-ify the DIE
> children is that the point of DIE creation is, in some particular cases,
> divorced from the knowledge of which container it's going to go in to. Yet
> pointer identity is needed before then.
> > > > >
> > > > > At least that's how I recall it - may be worth going down that
> path again just so we can write it up clearly for posterity, I dunno.
> > > >
> > > > Nope, I saw this too.
> > > >
> > > > In theory, you could get std::list to work here, by inserting into a
> > > > singleton linked list when you first need the reference, and splicing
> > > > it into its parent's linked list once it's created.  However, I
> think a
> > > > value-based list is too expensive anyway (see below)...
> > > >
> > > > > If it is possible, then I would love/prefer to use an existing
> non-intrusive container (or writing one if we have to). at least could
> prototype with std::list, then, as you say, implement a
> std::forward_list-like device with the bonus that we could fix sort and do
> the circular trick to allow insertion at the end, etc.
> > > >
> > > > Unfortunately the BumpPtrAllocator/skip-unlinking-the-list part of
> these
> > > > patches is really necessary, making std::list pretty awkward.  My
> payload
> > > > usually takes ~2 minutes when profiling, and when I profiled linked
> lists
> > > > without the BumpPtrAllocator it was talking >15 minutes to teardown
> (I
> > > > killed it, so I don't know how much longer it would have taken).
> > > >
> > > > Moreover, using a BumpPtrAllocator this way fixes the slow DIE
> teardown
> > > > that Fred was battling in llvm-dsymutil.
> > > >
> > > > > (& sorry, will get to proper review of this patch soon... )
> > > >
> > > > (Sadly, I completely missed this reply of yours until this morning
> (when
> > > > I was already preparing the patches below).)
> > > >
> > > > I've rebased the patches on top of r240566, and squashed them
> together
> > > > with shared implementation for the two linked lists in
> die-diet.patch.
> > > >
> > > > Yesterday (on top of 240244), I reran llc to check memory usage on
> the
> > > > verify-uselistorder.lto.opt.bc payload.  This patch drops peak memory
> > > > usage from 800MB down to 720MB, 10%.  I also timed before/after the
> patch
> > > > (no heap profile, CPU speed locked down, best of 5 runs) and
> measured a
> > > > 1% speedup.
> > > >
> > > >
> > > >
> > > >
> > > > (I've also attached die-block-is-not-a-die.patch, which fixes the
> > > > DIELoc/DIEBlock/DIE relationship.  I've been holding off on
> committing
> > > > it since it's built on top of the die-diet.patch.
> > > >
> > > >
> > > >
> > > > )
> > > >
> > > > >
> > > > >> We could configure this stuff via templates -- I'd be open to the
> idea
> > > > >> -- but I think the intersection between the implementations would
> be
> > > > >> practically nil.  Even the iterator implementations need to be
> > > > >> completely different.  If these are the only `push_back()`-enabled
> > > > >> slists in tree, is it premature to abstract it?  Would we even
> want the
> > > > >> same name as "normal" slists?
> > > > >>
> > > > >> (Regardless, I think it would be a great idea for someone to
> ADT-ify the
> > > > >> other linked lists!)
> > > > >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150625/b1262277/attachment.html>