[LLVMdev] RFC: Reduce the memory footprint of DIEs (and DIEValues)

Duncan P. N. Exon Smith dexonsmith at apple.com
Thu Jun 25 11:49:57 PDT 2015


> On 2015-Jun-25, at 10:48, David Blaikie <dblaikie at gmail.com> wrote:
> 
> I'm not sure of the merits of handing around unique_ptrs with a null deleter (I could see it in some generic template machinery - but this is pretty explicitly never-owning). I get the desire to tag the conceptual ownership, and I appreciate that - but without any checking I suspect it'll just become incorrect over time and we won't notice.

IMO, where tagging the ownership is valuable is before the DIE is
inserted in the tree (and in this patch that's the only place it
remains).  But maybe `unique_ptr` is kind of a funny name for what's
going on.

Maybe a way to keep it from bitrotting would be to make some of the
API private, something like:

    class OwnedDIE {
      friend class DIE;
      DIE *Die = nullptr;

      OwnedDIE(const OwnedDIE &) = delete;
      OwnedDIE &operator=(const OwnedDIE &) = delete;

      OwnedDIE(DIE *Die) : Die(Die) {}
      DIE *release() {
        auto *D = Die;
        Die = nullptr;
        return D;
      }

    public:
      OwnedDIE() = default;
      ~OwnedDIE() = default;
      OwnedDIE(OwnedDIE &X) : Die(X.Die) { X.Die = nullptr; }
      OwnedDIE &operator=(OwnedDIE &X) {
        Die = X.Die;
        X.Die = nullptr;
        return *this;
      }
      DIE *get() const { return Die; }
      DIE &operator*() const { return *Die; }
      DIE *operator->() const { return Die; }
      explicit operator bool() const { return Die; }
    };

    class DIE {
    public:
      // Calls OwnedDIE::OwnedDIE(DIE *).
      static OwnedDIE get(BumpPtrAllocator &Alloc, dwarf::Tag Tag);
      // Calls OwnedDIE::release().
      DIE &addChild(OwnedDIE Child);
    };

I guess I'm not sure how much better this is, although it would
prevent llvm-dsymutil from compiling without some more
reorganization (currently it calls `release()` and recreates
`unique_ptr<>` between functions).

> If we're going to keep it - could we add some checking in debug builds somehow? (skip the bump ptr allocation entirely in debug build? use a typed bump ptr allocator (it calls dtors) in a debug build? Just a side table hash set to check construction/destruction pairs and an assert that it's empty on exit?)

The `OwnedDIE` idea above adds a bit more safety, but it wouldn't
catch "leaks", and a consumer (like dsymutil) and a consumer could
choose to store top-level DIEs outside an `OwnedDIE` container.

Thoughts?

> Was there a reason to squash the DIEValue and DIE allocator changes together? (probably related to the DIEBlock? (both a value and a DIE) - I thought I remembered reading something from you about changing/splitting that? ah, that's your other patch - if that came first then we could split the allocation changes for DIEValue and DIE into separate patches? But that's too much work to rejig it all?)

I think I can split the DIEValue/DIE changes.  New patches early
afternoon I hope.

> 
> in DwarfLinker::patchLineTableForUnit why the change back to a range-for from std::find_if?

I don't remember =/.  I'll see if I can get the original flow working,
and if I can't I'll know why I changed it.

> 
> (I do get tired of seeing allocators passed in/around everywhere - would it make more sense to wrap the allocator in a type with utility functions for generating things? I suppose it's no great win, but feels a bit better, maybe? eg: "DIE::get(DIEValueAllocator, dwarf::DW_TAG_inlined_subroutine);" -> "DIEBuilder.get(dwarf::DW_TAG_inlined_subroutine)" or similar - for values it's trickier, the DIE and the allocator are both needed every time, but there's no easy access to the allocator from values or dies... *shrug* )

Maybe I'll play with some builder ideas, see if it reduces the churn
at all?  We could have:

    DIEBuilder DieB(Alloc);
    auto Die = DieB.get(Tag);
    DIEValueBuilder DieValueB = DieB.addValues(Die);
    DieValueB.addValue(SomeDieValue);

I'm not sure this will actually be better in practice though...  I'll
play around after posting new patches.

> On Wed, Jun 24, 2015 at 1:11 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
> 
> > On 2015 Jun 17, at 11:36, David Blaikie <dblaikie at gmail.com> wrote:
> >
> >> On Wed, Jun 17, 2015 at 11:26 AM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
> >>
> >> > On 2015 Jun 16, at 21:40, Yaron Keren <yaron.keren at gmail.com> wrote:
> >> >
> >> > Hi Duncan,
> >> >
> >> > Singly linked lists are used throught LLVM: ArrayRecycler, Registry, LiveInterval, SCEVUnknown, CGBlockInfo, MacroArgCache, ...DIEValueList.
> >> >
> >> > What do you think about implementing a singly-linked ADT?
> >> >
> >>
> >> Interesting idea.
> >>
> >> I've taken a look at the other singly-linked lists you mentioned.  Most
> >> of these are some version of a free/recycling list (ArrayRecycler,
> >> Registry, MacroArgCache, etc.), which need super simple push/pop API and
> >> no memory management.
> >>
> >> LiveInterval and CGBlock also need iteration and a way to conditionally
> >> remove elements (something like `remove_if()`), and in a few cases it
> >> would be convenient to have API to destruct and/or destroy nodes.
> >>
> >> I think these would all be straightforward to formalize into an ADT.  We
> >> could even use it to implement a value-based `llvm::forward_list<>` that
> >> avoided the sorting bug in libstdc++ (which gets "stable" exactly
> >> backwards) that David hit when refactoring tablegen.
> >>
> >> Unfortunately, I think it'd be hard to reuse for DIE::Children and
> >> DIEValueList.  The latter wants to preserve insertion order and needs
> >> `push_back()` API (the former can technically get away with a timely
> >> call to `reverse()`, but it's awkward).  Supporting `push_back()`
> >> efficiently requires pointing at the last element somehow, either via an
> >> extra pointer or circular logic (what my latest patch does for
> >> DIE::Children).
> >
> > One of the other problems I hit when trying to value-ify the DIE children is that the point of DIE creation is, in some particular cases, divorced from the knowledge of which container it's going to go in to. Yet pointer identity is needed before then.
> >
> > At least that's how I recall it - may be worth going down that path again just so we can write it up clearly for posterity, I dunno.
> 
> Nope, I saw this too.
> 
> In theory, you could get std::list to work here, by inserting into a
> singleton linked list when you first need the reference, and splicing
> it into its parent's linked list once it's created.  However, I think a
> value-based list is too expensive anyway (see below)...
> 
> > If it is possible, then I would love/prefer to use an existing non-intrusive container (or writing one if we have to). at least could prototype with std::list, then, as you say, implement a std::forward_list-like device with the bonus that we could fix sort and do the circular trick to allow insertion at the end, etc.
> 
> Unfortunately the BumpPtrAllocator/skip-unlinking-the-list part of these
> patches is really necessary, making std::list pretty awkward.  My payload
> usually takes ~2 minutes when profiling, and when I profiled linked lists
> without the BumpPtrAllocator it was talking >15 minutes to teardown (I
> killed it, so I don't know how much longer it would have taken).
> 
> Moreover, using a BumpPtrAllocator this way fixes the slow DIE teardown
> that Fred was battling in llvm-dsymutil.
> 
> > (& sorry, will get to proper review of this patch soon... )
> 
> (Sadly, I completely missed this reply of yours until this morning (when
> I was already preparing the patches below).)
> 
> I've rebased the patches on top of r240566, and squashed them together
> with shared implementation for the two linked lists in die-diet.patch.
> 
> Yesterday (on top of 240244), I reran llc to check memory usage on the
> verify-uselistorder.lto.opt.bc payload.  This patch drops peak memory
> usage from 800MB down to 720MB, 10%.  I also timed before/after the patch
> (no heap profile, CPU speed locked down, best of 5 runs) and measured a
> 1% speedup.
> 
> 
> 
> 
> (I've also attached die-block-is-not-a-die.patch, which fixes the
> DIELoc/DIEBlock/DIE relationship.  I've been holding off on committing
> it since it's built on top of the die-diet.patch.
> 
> 
> 
> )
> 
> >
> >> We could configure this stuff via templates -- I'd be open to the idea
> >> -- but I think the intersection between the implementations would be
> >> practically nil.  Even the iterator implementations need to be
> >> completely different.  If these are the only `push_back()`-enabled
> >> slists in tree, is it premature to abstract it?  Would we even want the
> >> same name as "normal" slists?
> >>
> >> (Regardless, I think it would be a great idea for someone to ADT-ify the
> >> other linked lists!)
> >





More information about the llvm-commits mailing list