[PATCH] [PATCH/RFC] Use a BumpAllocator for DIEs

Tue Jan 20 16:44:33 PST 2015

> On Jan 20, 2015, at 3:33 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
> 
>> 
>> On 2015 Jan 20, at 12:59, Frédéric Riss <friss at apple.com> wrote:
>> 
>> 
>>> On Jan 20, 2015, at 12:19 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
>>> 
>>>> 
>>>> On 2015 Jan 20, at 11:28, Frederic Riss <friss at apple.com> wrote:
>>>> 
>>>> Hi dblaikie, echristo,
>>>> 
>>>> First of all, although this patch passes the testsuite, I don't
>>>> think it is ready to go in, at least not without a thorough
>>>> discussion of its downsides.
>>>> 
>>>> What the patch does is move away from using unique_ptr<>s to
>>>> track DIE object memory and allocate most of them using a
>>>> BumpAllocator. Making releasing the memory for the DIE tree
>>>> basically free. (DIEBlocks and DIELocs are still allocated using
>>>> the standard allocator, but they were already using separate
>>>> bookkeeping, so this doesn't complicate the patch).
>>>> 
>>>> I'm going down this road because the teardown of the DIE tree
>>>> uses a lot of CPU time, especially in dsymutil's usecase. In
>>>> my prototype, the DIE destructors take ~40 seconds to run for
>>>> DIE trees worth of 700MB of debug infomration. That's quite a
>>>> lot especially considering that dsymutil's runtime on this kind
>>>> of data should be around 2 minutes. I really need to find an
>>>> acceptale solution for dsymutil's usecase.
>>>> 
>>>> This patch achieves the performance gains that I expect, but there
>>>> is one big downside: it limits the number of attributes a DIE can
>>>> accept to a fixed maximum (well technically, if you remove the
>>>> assert in the code, it will work, but it will leak the attribute
>>>> vectors going over the limit).
>>> 
>>> Is there a way to explicitly walk through and clear these vectors?
>>> Or is that already too expensive (in most cases it'll be a no-op,
>>> but you'll still need to do the walk)?
>>> 
>>> (IIRC, `SmallVector<>::clear()` deallocates.)
>> 
>> I’d need to try it to be sure, but I have the feeling it would be expensive.
>> I seem to remember something about the average size of (encoded) DIEs
>> being 17 bytes (I haven’t checked that fact, but the order of magnitude 
>> feels right). So for 700MB of encoded debug information that would 
>> represent 43 million DIEs. That’s a huge tree to walk.
>> 
>> Maybe it would however be possible to track these separately. Instead of
>> the assert, introduce some bookkeeping code that would remember the
>> DIEs that need their attributes freed.  This is ugly in the sense that it needs
>> access to the bookkeeping structures from the addAttribute method.
>> Maybe the ugliness can be alleviated by having all the DIE operations go
>> through some new DIEAllocator object. Instead of
>> DIE->addValue(dwarf::DW_AT_something, Val);
>> you’d need to write
>> DIEAlloc.addValue(DIE, dwarf::DW_AT_something, Val);
>> 
>> Thoughts?
> 
> This seems fine to me.
> 
> Might be worth trying the walking approach (just in case your intuition
> is off) since it seems simpler, but tracking it directly seems fine too.
> 
> I have another idea, but first: there's no `removeValue()` after
> `addValue()`, right?  (It's a one-way operation?)

AFAIK this is true.

> If I'm right, and assuming you don't need random access or anything (you
> just need to walk through the values), you could eschew SmallVector
> entirely, and instead have a linked list of 8-element (or whatever)
> allocations, all tracked with the same BumpPtrAllocator.

Or maybe just use BumpPtrAllocated ilist nodes like I do for the DIE
themselves.

Thanks,
Fred

> Something like this:
> 
>    class DIEValueSequence {
>      constexpr unsigned NumPerAlloc = 7;
>      struct List {
>        List *Next = nullptr;
>        DIEValue *Vals[NumPerAlloc] = {nullptr};
>      };
>      List Head;
>      unsigned Size = 0;
>      unsigned TailSize = 0;
>      List *Tail = Head;
> 
>    public:
>      void push(BumpPtrAllocator &Alloc, DIEValue *Val) {
>        if (TailSize == NumPerAlloc) {
>          Tail->Next = new (Alloc.allocate(List)) List;
>          Tail = Tail->Next;
>          TailSize = 0;
>        }
>        Tail->Vals[TailSize++] = Val;
>      }
> 
>      struct iterator {
>        List *L;
>        unsigned I;
> 
>      public:
>        void operator++() {
>          if (++I < NumPerAlloc)
>            return;
>          L = L->Next;
>          I = 0;
>        }
>        DIEValue *operator*() const { return L->Vals[I]; }
>        bool operator==(const iterator &X) const {
>          return L == X.L && I == X.I;
>        }
>      };
> 
>      iterator begin() { return iterator{&Head, 0}; }
>      iterator end() {
>        return TailSize == NumPerAlloc ? iterator{nullptr, 0}
>                                       : iterator{Tail, TailSize};
>      }
>    };
> 
> ^ would get you what you need without too much code.  You could
> probably merge the attributes into the same sequence as the
> values so that there's no need to template the data structure or
> use zip-iterators or whatever.
> But I don't know, maybe you need random access?
> 
> David, do you have any better ideas?
> 
>> 
>> Fred
>> 
>> 
>>>> 
>>>> Imposing this limit on users seems too strong as a constraint. 
>>>> Especially considering not all frontends are in-tree. I can imagine
>>>> more flexible solutions involving complicated allocation strategies,
>>>> but I wanted to ask for opinions before I work on something that I
>>>> fear would end up much more complicated than this patch. Maybe
>>>> there's an easier solution that I haven't seen, happy to hear
>>>> suggestions.
>>>> 
>>>> BTW, the runtime difference is definitely measurable on a clang
>>>> debug build also, although the gained seconds do not represent
>>>> 40% of the runtime like in dsymutil's case. On my machine, a clang
>>>> debug build goes from:
>>>> 
>>>> real    14m38.308s
>>>> user    45m26.856s
>>>> sys     3m53.519s
>>>> 
>>>> without the patch, to:
>>>> 
>>>> real    14m19.762s
>>>> user    44m14.438s
>>>> sys     3m45.520s
>>>> 
>>>> Shaving more than one minute of user time and 15 to 20 seconds from
>>>> wall-clock time.
>>>> 
>>>> http://reviews.llvm.org/D7072
>>>> 
>>>> Files:
>>>> include/llvm/CodeGen/DIE.h
>>>> lib/CodeGen/AsmPrinter/DIE.cpp
>>>> lib/CodeGen/AsmPrinter/DIEHash.cpp
>>>> lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
>>>> lib/CodeGen/AsmPrinter/DwarfCompileUnit.h
>>>> lib/CodeGen/AsmPrinter/DwarfDebug.cpp
>>>> lib/CodeGen/AsmPrinter/DwarfFile.cpp
>>>> lib/CodeGen/AsmPrinter/DwarfUnit.cpp
>>>> lib/CodeGen/AsmPrinter/DwarfUnit.h
>>>> unittests/CodeGen/DIEHashTest.cpp
>>>> 
>>>> EMAIL PREFERENCES
>>>> http://reviews.llvm.org/settings/panel/emailpreferences/
>>>> <D7072.18447.patch>_______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150120/a64a0151/attachment.html>