[PATCH] [PATCH/RFC] Use a BumpAllocator for DIEs

Tue Jan 20 16:57:30 PST 2015

> On 2015 Jan 20, at 16:52, Frédéric Riss <friss at apple.com> wrote:
> 
>> 
>> On Jan 20, 2015, at 4:49 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
>> 
>>> 
>>> On 2015 Jan 20, at 16:44, Frédéric Riss <friss at apple.com> wrote:
>>> 
>>>> 
>>>> On Jan 20, 2015, at 3:33 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
>>>> 
>>>>> 
>>>>> On 2015 Jan 20, at 12:59, Frédéric Riss <friss at apple.com> wrote:
>>>>> 
>>>>> 
>>>>>> On Jan 20, 2015, at 12:19 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> On 2015 Jan 20, at 11:28, Frederic Riss <friss at apple.com> wrote:
>>>>>>> 
>>>>>>> Hi dblaikie, echristo,
>>>>>>> 
>>>>>>> First of all, although this patch passes the testsuite, I don't
>>>>>>> think it is ready to go in, at least not without a thorough
>>>>>>> discussion of its downsides.
>>>>>>> 
>>>>>>> What the patch does is move away from using unique_ptr<>s to
>>>>>>> track DIE object memory and allocate most of them using a
>>>>>>> BumpAllocator. Making releasing the memory for the DIE tree
>>>>>>> basically free. (DIEBlocks and DIELocs are still allocated using
>>>>>>> the standard allocator, but they were already using separate
>>>>>>> bookkeeping, so this doesn't complicate the patch).
>>>>>>> 
>>>>>>> I'm going down this road because the teardown of the DIE tree
>>>>>>> uses a lot of CPU time, especially in dsymutil's usecase. In
>>>>>>> my prototype, the DIE destructors take ~40 seconds to run for
>>>>>>> DIE trees worth of 700MB of debug infomration. That's quite a
>>>>>>> lot especially considering that dsymutil's runtime on this kind
>>>>>>> of data should be around 2 minutes. I really need to find an
>>>>>>> acceptale solution for dsymutil's usecase.
>>>>>>> 
>>>>>>> This patch achieves the performance gains that I expect, but there
>>>>>>> is one big downside: it limits the number of attributes a DIE can
>>>>>>> accept to a fixed maximum (well technically, if you remove the
>>>>>>> assert in the code, it will work, but it will leak the attribute
>>>>>>> vectors going over the limit).
>>>>>> 
>>>>>> Is there a way to explicitly walk through and clear these vectors?
>>>>>> Or is that already too expensive (in most cases it'll be a no-op,
>>>>>> but you'll still need to do the walk)?
>>>>>> 
>>>>>> (IIRC, `SmallVector<>::clear()` deallocates.)
>>>>> 
>>>>> I’d need to try it to be sure, but I have the feeling it would be expensive.
>>>>> I seem to remember something about the average size of (encoded) DIEs
>>>>> being 17 bytes (I haven’t checked that fact, but the order of magnitude 
>>>>> feels right). So for 700MB of encoded debug information that would 
>>>>> represent 43 million DIEs. That’s a huge tree to walk.
>>>>> 
>>>>> Maybe it would however be possible to track these separately. Instead of
>>>>> the assert, introduce some bookkeeping code that would remember the
>>>>> DIEs that need their attributes freed.  This is ugly in the sense that it needs
>>>>> access to the bookkeeping structures from the addAttribute method.
>>>>> Maybe the ugliness can be alleviated by having all the DIE operations go
>>>>> through some new DIEAllocator object. Instead of
>>>>> DIE->addValue(dwarf::DW_AT_something, Val);
>>>>> you’d need to write
>>>>> DIEAlloc.addValue(DIE, dwarf::DW_AT_something, Val);
>>>>> 
>>>>> Thoughts?
>>>> 
>>>> This seems fine to me.
>>>> 
>>>> Might be worth trying the walking approach (just in case your intuition
>>>> is off) since it seems simpler, but tracking it directly seems fine too.
>>>> 
>>>> I have another idea, but first: there's no `removeValue()` after
>>>> `addValue()`, right?  (It's a one-way operation?)
>>> 
>>> AFAIK this is true.
>>> 
>>>> If I'm right, and assuming you don't need random access or anything (you
>>>> just need to walk through the values), you could eschew SmallVector
>>>> entirely, and instead have a linked list of 8-element (or whatever)
>>>> allocations, all tracked with the same BumpPtrAllocator.
>>> 
>>> Or maybe just use BumpPtrAllocated ilist nodes like I do for the DIE
>>> themselves.
>>> 
>> 
>> I was trying to figure out if an intrusive list was valid here.  It
>> looked like some of the `DIEValue`s are reused across different `DIE`s,
>> but maybe I read the code wrong?
> 
> You’re right, there is a special DIEValue for the value 1 IIRC so the intrusive list
> wouldn’t cut it.

Too bad.  That would have been clean and simple.

(Although I imagine it's possible we'd want to more aggressively
reuse DIE values in the future, depending on where the memory is
being used (and depending on whether that's even a valid idea??).
I haven't looked at the backend LTO explosion yet, so I have no
idea really where it's happening outside of "when DIEs are being
created".)