[llvm-dev] Reducing DWARF emitter memory consumption

Fri Feb 5 17:25:30 PST 2016

On Fri, Feb 05, 2016 at 04:28:53PM -0800, Duncan P. N. Exon Smith wrote:
> 
> > On 2016-Feb-05, at 15:17, Peter Collingbourne <peter at pcc.me.uk> wrote:
> > 
> > Hi all,
> > 
> > We have profiled [1] the memory usage in LLVM when LTO'ing Chromium, and
> > we've found that one of the top consumers of memory is the DWARF emitter in
> > lib/CodeGen/AsmPrinter/Dwarf*. I've been reading the DWARF emitter code and
> > I have a few ideas in mind for how to reduce its memory consumption. One
> > idea I've had is to restructure the emitter so that (for the most part) it
> > directly produces the bytes and relocations that need to go into the DWARF
> > sections without going through other data structures such as DIE and DIEValue.
> > 
> > I understand that the DWARF emitter needs to accommodate incomplete entities
> > that may be completed elsewhere during tree construction (e.g. abstract origins
> > for inlined functions, special members for types), so here's a quick high-level
> > sketch of the data structures that I believe could support this design:
> > 
> > struct DIEBlock {
> >  SmallVector<char, 1> Data;
> >  std::vector<InternalReloc> IntRelocs;
> >  std::vector<ExternalReloc> ExtRelocs;
> >  DIEBlock *Next;
> > };
> > 
> > // This would be used to represent things like DW_AT_type references to types
> > struct InternalReloc {
> >  size_t Offset; // offset within DIEBlock::Data
> >  DIEBlock *Target; // the offset within Target is at Data[Offset...Offset+Size]
> > };
> > 
> > // This would be used to represent things like pointers to .debug_loc/.debug_str or to functions/globals
> > struct ExternalReloc {
> >  size_t Offset; // offset within DIEBlock::Data
> >  MCSymbol *Target; // the offset within Target is at Data[Offset...Offset+Size]
> > };
> > 
> > struct DwarfBuilder {
> >  DIEBlock *First;
> >  DIEBlock *Cur;
> >  DenseMap<DISubprogram *, DIEBlock *> Subprograms;
> >  DenseMap<DIType *, DIEBlock *> Types;
> >  DwarfBuilder() : First(new DIEBlock), Cur(First) {}
> >  // builder implementation goes here...
> > };
> > 
> > Normally, the DwarfBuilder will just emit bytes to Cur->Data (with possibly
> > internal or external relocations to IntRelocs/ExtRelocs), but if it ever
> > needs to create a "gap" for an incomplete data structure (e.g. at the end of a
> > subprogram or a struct type), it will create a new DIEBlock New, store it to
> > Cur->Next, store Cur in a DenseMap associated with the subprogram/type/etc
> > and store New to Cur. To fill a gap later, the DwarfBuilder can pull the
> > DIEBlock out of the DenseMap and start appending there. Once the IR is fully
> > visited, the debug info writer will walk the linked list starting at First,
> > calculate a byte offset for each DIEBlock, apply any internal relocations
> > and write Data using the AsmPrinter (e.g. using EmitBytes, or maybe some
> > other new interface that also supports relocations and avoids copying).
> > 
> > Does that sound reasonable? Is there anything I haven't accounted for?
> 
> Does this design work well with the way llvm-dsymutil uses DIEs and DIEValues?

I haven't looked too closely at what llvm-dsymutil does, so I can't say
for sure. If it only uses DIE/DIEValue to produce DIEs, then it most likely
should work.

> I'm also interested in whether this will be faster than the current one.  I spent some time optimizing the teardown down of the DIE tree in the summer and it would be nice not to lose that.  (Sorry, maybe it's obvious from above, but I've only had a moment to skim your proposal.  I'll try to look in more detail over the weekend.)

I think it should be possible to tweak the design to use a bump pointer
allocator like we do now for DIE/DIEValue instead of allocating vectors on
the heap, but I haven't fully thought it through.

Thanks,
-- 
Peter