[PATCH] [PATCH/RFC] Use a BumpAllocator for DIEs

Duncan P. N. Exon Smith dexonsmith at apple.com
Tue Jan 20 15:33:04 PST 2015


> On 2015 Jan 20, at 12:59, Frédéric Riss <friss at apple.com> wrote:
> 
> 
>> On Jan 20, 2015, at 12:19 PM, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
>> 
>>> 
>>> On 2015 Jan 20, at 11:28, Frederic Riss <friss at apple.com> wrote:
>>> 
>>> Hi dblaikie, echristo,
>>> 
>>> First of all, although this patch passes the testsuite, I don't
>>> think it is ready to go in, at least not without a thorough
>>> discussion of its downsides.
>>> 
>>> What the patch does is move away from using unique_ptr<>s to
>>> track DIE object memory and allocate most of them using a
>>> BumpAllocator. Making releasing the memory for the DIE tree
>>> basically free. (DIEBlocks and DIELocs are still allocated using
>>> the standard allocator, but they were already using separate
>>> bookkeeping, so this doesn't complicate the patch).
>>> 
>>> I'm going down this road because the teardown of the DIE tree
>>> uses a lot of CPU time, especially in dsymutil's usecase. In
>>> my prototype, the DIE destructors take ~40 seconds to run for
>>> DIE trees worth of 700MB of debug infomration. That's quite a
>>> lot especially considering that dsymutil's runtime on this kind
>>> of data should be around 2 minutes. I really need to find an
>>> acceptale solution for dsymutil's usecase.
>>> 
>>> This patch achieves the performance gains that I expect, but there
>>> is one big downside: it limits the number of attributes a DIE can
>>> accept to a fixed maximum (well technically, if you remove the
>>> assert in the code, it will work, but it will leak the attribute
>>> vectors going over the limit).
>> 
>> Is there a way to explicitly walk through and clear these vectors?
>> Or is that already too expensive (in most cases it'll be a no-op,
>> but you'll still need to do the walk)?
>> 
>> (IIRC, `SmallVector<>::clear()` deallocates.)
> 
> I’d need to try it to be sure, but I have the feeling it would be expensive.
> I seem to remember something about the average size of (encoded) DIEs
> being 17 bytes (I haven’t checked that fact, but the order of magnitude 
> feels right). So for 700MB of encoded debug information that would 
> represent 43 million DIEs. That’s a huge tree to walk.
> 
> Maybe it would however be possible to track these separately. Instead of
> the assert, introduce some bookkeeping code that would remember the
> DIEs that need their attributes freed.  This is ugly in the sense that it needs
> access to the bookkeeping structures from the addAttribute method.
> Maybe the ugliness can be alleviated by having all the DIE operations go
> through some new DIEAllocator object. Instead of
> DIE->addValue(dwarf::DW_AT_something, Val);
> you’d need to write
> DIEAlloc.addValue(DIE, dwarf::DW_AT_something, Val);
> 
> Thoughts?

This seems fine to me.

Might be worth trying the walking approach (just in case your intuition
is off) since it seems simpler, but tracking it directly seems fine too.

I have another idea, but first: there's no `removeValue()` after
`addValue()`, right?  (It's a one-way operation?)

If I'm right, and assuming you don't need random access or anything (you
just need to walk through the values), you could eschew SmallVector
entirely, and instead have a linked list of 8-element (or whatever)
allocations, all tracked with the same BumpPtrAllocator.

Something like this:

    class DIEValueSequence {
      constexpr unsigned NumPerAlloc = 7;
      struct List {
        List *Next = nullptr;
        DIEValue *Vals[NumPerAlloc] = {nullptr};
      };
      List Head;
      unsigned Size = 0;
      unsigned TailSize = 0;
      List *Tail = Head;

    public:
      void push(BumpPtrAllocator &Alloc, DIEValue *Val) {
        if (TailSize == NumPerAlloc) {
          Tail->Next = new (Alloc.allocate(List)) List;
          Tail = Tail->Next;
          TailSize = 0;
        }
        Tail->Vals[TailSize++] = Val;
      }

      struct iterator {
        List *L;
        unsigned I;

      public:
        void operator++() {
          if (++I < NumPerAlloc)
            return;
          L = L->Next;
          I = 0;
        }
        DIEValue *operator*() const { return L->Vals[I]; }
        bool operator==(const iterator &X) const {
          return L == X.L && I == X.I;
        }
      };

      iterator begin() { return iterator{&Head, 0}; }
      iterator end() {
        return TailSize == NumPerAlloc ? iterator{nullptr, 0}
                                       : iterator{Tail, TailSize};
      }
    };

^ would get you what you need without too much code.  You could
probably merge the attributes into the same sequence as the
values so that there's no need to template the data structure or
use zip-iterators or whatever.

But I don't know, maybe you need random access?

David, do you have any better ideas?

> 
> Fred
> 
> 
>>> 
>>> Imposing this limit on users seems too strong as a constraint. 
>>> Especially considering not all frontends are in-tree. I can imagine
>>> more flexible solutions involving complicated allocation strategies,
>>> but I wanted to ask for opinions before I work on something that I
>>> fear would end up much more complicated than this patch. Maybe
>>> there's an easier solution that I haven't seen, happy to hear
>>> suggestions.
>>> 
>>> BTW, the runtime difference is definitely measurable on a clang
>>> debug build also, although the gained seconds do not represent
>>> 40% of the runtime like in dsymutil's case. On my machine, a clang
>>> debug build goes from:
>>> 
>>> real    14m38.308s
>>> user    45m26.856s
>>> sys     3m53.519s
>>> 
>>> without the patch, to:
>>> 
>>> real    14m19.762s
>>> user    44m14.438s
>>> sys     3m45.520s
>>> 
>>> Shaving more than one minute of user time and 15 to 20 seconds from
>>> wall-clock time.
>>> 
>>> http://reviews.llvm.org/D7072
>>> 
>>> Files:
>>> include/llvm/CodeGen/DIE.h
>>> lib/CodeGen/AsmPrinter/DIE.cpp
>>> lib/CodeGen/AsmPrinter/DIEHash.cpp
>>> lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
>>> lib/CodeGen/AsmPrinter/DwarfCompileUnit.h
>>> lib/CodeGen/AsmPrinter/DwarfDebug.cpp
>>> lib/CodeGen/AsmPrinter/DwarfFile.cpp
>>> lib/CodeGen/AsmPrinter/DwarfUnit.cpp
>>> lib/CodeGen/AsmPrinter/DwarfUnit.h
>>> unittests/CodeGen/DIEHashTest.cpp
>>> 
>>> EMAIL PREFERENCES
>>> http://reviews.llvm.org/settings/panel/emailpreferences/
>>> <D7072.18447.patch>_______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 





More information about the llvm-commits mailing list