[PATCH] [PATCH/RFC] Use a BumpAllocator for DIEs

Tue Jan 20 16:52:42 PST 2015

> On Jan 20, 2015, at 4:49 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
> 
>> 
>> On 2015 Jan 20, at 16:44, Frédéric Riss <friss at apple.com> wrote:
>> 
>>> 
>>> On Jan 20, 2015, at 3:33 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
>>> 
>>>> 
>>>> On 2015 Jan 20, at 12:59, Frédéric Riss <friss at apple.com> wrote:
>>>> 
>>>> 
>>>>> On Jan 20, 2015, at 12:19 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
>>>>> 
>>>>>> 
>>>>>> On 2015 Jan 20, at 11:28, Frederic Riss <friss at apple.com> wrote:
>>>>>> 
>>>>>> Hi dblaikie, echristo,
>>>>>> 
>>>>>> First of all, although this patch passes the testsuite, I don't
>>>>>> think it is ready to go in, at least not without a thorough
>>>>>> discussion of its downsides.
>>>>>> 
>>>>>> What the patch does is move away from using unique_ptr<>s to
>>>>>> track DIE object memory and allocate most of them using a
>>>>>> BumpAllocator. Making releasing the memory for the DIE tree
>>>>>> basically free. (DIEBlocks and DIELocs are still allocated using
>>>>>> the standard allocator, but they were already using separate
>>>>>> bookkeeping, so this doesn't complicate the patch).
>>>>>> 
>>>>>> I'm going down this road because the teardown of the DIE tree
>>>>>> uses a lot of CPU time, especially in dsymutil's usecase. In
>>>>>> my prototype, the DIE destructors take ~40 seconds to run for
>>>>>> DIE trees worth of 700MB of debug infomration. That's quite a
>>>>>> lot especially considering that dsymutil's runtime on this kind
>>>>>> of data should be around 2 minutes. I really need to find an
>>>>>> acceptale solution for dsymutil's usecase.
>>>>>> 
>>>>>> This patch achieves the performance gains that I expect, but there
>>>>>> is one big downside: it limits the number of attributes a DIE can
>>>>>> accept to a fixed maximum (well technically, if you remove the
>>>>>> assert in the code, it will work, but it will leak the attribute
>>>>>> vectors going over the limit).
>>>>> 
>>>>> Is there a way to explicitly walk through and clear these vectors?
>>>>> Or is that already too expensive (in most cases it'll be a no-op,
>>>>> but you'll still need to do the walk)?
>>>>> 
>>>>> (IIRC, `SmallVector<>::clear()` deallocates.)
>>>> 
>>>> I’d need to try it to be sure, but I have the feeling it would be expensive.
>>>> I seem to remember something about the average size of (encoded) DIEs
>>>> being 17 bytes (I haven’t checked that fact, but the order of magnitude 
>>>> feels right). So for 700MB of encoded debug information that would 
>>>> represent 43 million DIEs. That’s a huge tree to walk.
>>>> 
>>>> Maybe it would however be possible to track these separately. Instead of
>>>> the assert, introduce some bookkeeping code that would remember the
>>>> DIEs that need their attributes freed.  This is ugly in the sense that it needs
>>>> access to the bookkeeping structures from the addAttribute method.
>>>> Maybe the ugliness can be alleviated by having all the DIE operations go
>>>> through some new DIEAllocator object. Instead of
>>>> DIE->addValue(dwarf::DW_AT_something, Val);
>>>> you’d need to write
>>>> DIEAlloc.addValue(DIE, dwarf::DW_AT_something, Val);
>>>> 
>>>> Thoughts?
>>> 
>>> This seems fine to me.
>>> 
>>> Might be worth trying the walking approach (just in case your intuition
>>> is off) since it seems simpler, but tracking it directly seems fine too.
>>> 
>>> I have another idea, but first: there's no `removeValue()` after
>>> `addValue()`, right?  (It's a one-way operation?)
>> 
>> AFAIK this is true.
>> 
>>> If I'm right, and assuming you don't need random access or anything (you
>>> just need to walk through the values), you could eschew SmallVector
>>> entirely, and instead have a linked list of 8-element (or whatever)
>>> allocations, all tracked with the same BumpPtrAllocator.
>> 
>> Or maybe just use BumpPtrAllocated ilist nodes like I do for the DIE
>> themselves.
>> 
> 
> I was trying to figure out if an intrusive list was valid here.  It
> looked like some of the `DIEValue`s are reused across different `DIE`s,
> but maybe I read the code wrong?

You’re right, there is a special DIEValue for the value 1 IIRC so the intrusive list
wouldn’t cut it.

Fred

>> Thanks,
>> Fred
>> 
>>> Something like this:
>>> 
>>>   class DIEValueSequence {
>>>     constexpr unsigned NumPerAlloc = 7;
>>>     struct List {
>>>       List *Next = nullptr;
>>>       DIEValue *Vals[NumPerAlloc] = {nullptr};
>>>     };
>>>     List Head;
>>>     unsigned Size = 0;
>>>     unsigned TailSize = 0;
>>>     List *Tail = Head;
>>> 
>>>   public:
>>>     void push(BumpPtrAllocator &Alloc, DIEValue *Val) {
>>>       if (TailSize == NumPerAlloc) {
>>>         Tail->Next = new (Alloc.allocate(List)) List;
>>>         Tail = Tail->Next;
>>>         TailSize = 0;
>>>       }
>>>       Tail->Vals[TailSize++] = Val;
>>>     }
>>> 
>>>     struct iterator {
>>>       List *L;
>>>       unsigned I;
>>> 
>>>     public:
>>>       void operator++() {
>>>         if (++I < NumPerAlloc)
>>>           return;
>>>         L = L->Next;
>>>         I = 0;
>>>       }
>>>       DIEValue *operator*() const { return L->Vals[I]; }
>>>       bool operator==(const iterator &X) const {
>>>         return L == X.L && I == X.I;
>>>       }
>>>     };
>>> 
>>>     iterator begin() { return iterator{&Head, 0}; }
>>>     iterator end() {
>>>       return TailSize == NumPerAlloc ? iterator{nullptr, 0}
>>>                                      : iterator{Tail, TailSize};
>>>     }
>>>   };
>>> 
>>> ^ would get you what you need without too much code.  You could
>>> probably merge the attributes into the same sequence as the
>>> values so that there's no need to template the data structure or
>>> use zip-iterators or whatever.
>>> But I don't know, maybe you need random access?
>>> 
>>> David, do you have any better ideas?
>>> 
>>>> 
>>>> Fred
>>>> 
>>>> 
>>>>>> 
>>>>>> Imposing this limit on users seems too strong as a constraint. 
>>>>>> Especially considering not all frontends are in-tree. I can imagine
>>>>>> more flexible solutions involving complicated allocation strategies,
>>>>>> but I wanted to ask for opinions before I work on something that I
>>>>>> fear would end up much more complicated than this patch. Maybe
>>>>>> there's an easier solution that I haven't seen, happy to hear
>>>>>> suggestions.
>>>>>> 
>>>>>> BTW, the runtime difference is definitely measurable on a clang
>>>>>> debug build also, although the gained seconds do not represent
>>>>>> 40% of the runtime like in dsymutil's case. On my machine, a clang
>>>>>> debug build goes from:
>>>>>> 
>>>>>> real    14m38.308s
>>>>>> user    45m26.856s
>>>>>> sys     3m53.519s
>>>>>> 
>>>>>> without the patch, to:
>>>>>> 
>>>>>> real    14m19.762s
>>>>>> user    44m14.438s
>>>>>> sys     3m45.520s
>>>>>> 
>>>>>> Shaving more than one minute of user time and 15 to 20 seconds from
>>>>>> wall-clock time.
>>>>>> 
>>>>>> http://reviews.llvm.org/D7072
>>>>>> 
>>>>>> Files:
>>>>>> include/llvm/CodeGen/DIE.h
>>>>>> lib/CodeGen/AsmPrinter/DIE.cpp
>>>>>> lib/CodeGen/AsmPrinter/DIEHash.cpp
>>>>>> lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
>>>>>> lib/CodeGen/AsmPrinter/DwarfCompileUnit.h
>>>>>> lib/CodeGen/AsmPrinter/DwarfDebug.cpp
>>>>>> lib/CodeGen/AsmPrinter/DwarfFile.cpp
>>>>>> lib/CodeGen/AsmPrinter/DwarfUnit.cpp
>>>>>> lib/CodeGen/AsmPrinter/DwarfUnit.h
>>>>>> unittests/CodeGen/DIEHashTest.cpp
>>>>>> 
>>>>>> EMAIL PREFERENCES
>>>>>> http://reviews.llvm.org/settings/panel/emailpreferences/
>>>>>> <D7072.18447.patch>_______________________________________________
>>>>>> llvm-commits mailing list
>>>>>> llvm-commits at cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150120/20c35739/attachment.html>