[llvm-dev] Reducing DWARF emitter memory consumption

Fri Feb 5 16:58:45 PST 2016

> On Feb 5, 2016, at 3:17 PM, Peter Collingbourne via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> Hi all,
> 
> We have profiled [1] the memory usage in LLVM when LTO'ing Chromium, and
> we've found that one of the top consumers of memory is the DWARF emitter in
> lib/CodeGen/AsmPrinter/Dwarf*.

I'm staring at the profile attached to the post #15 on the link you posted, can you confirm that the Dwarf emitter accounts for 6.7%+15.6%=22.3% of the the total allocated memory?
If I understand correctly the numbers, this does not tell anything about how much the Dwarf emitter accounts on the *peak memory* usage (could be more, could be nothing...).

Limiting the number of calls to the memory system is always welcome, so whatever the answer to my question is it does not remove any value to improvements you could make here :)

Thanks,

-- 
Mehdi

> I've been reading the DWARF emitter code and
> I have a few ideas in mind for how to reduce its memory consumption. One
> idea I've had is to restructure the emitter so that (for the most part) it
> directly produces the bytes and relocations that need to go into the DWARF
> sections without going through other data structures such as DIE and DIEValue.
> 
> I understand that the DWARF emitter needs to accommodate incomplete entities
> that may be completed elsewhere during tree construction (e.g. abstract origins
> for inlined functions, special members for types), so here's a quick high-level
> sketch of the data structures that I believe could support this design:
> 
> struct DIEBlock {
>  SmallVector<char, 1> Data;
>  std::vector<InternalReloc> IntRelocs;
>  std::vector<ExternalReloc> ExtRelocs;
>  DIEBlock *Next;
> };
> 
> // This would be used to represent things like DW_AT_type references to types
> struct InternalReloc {
>  size_t Offset; // offset within DIEBlock::Data
>  DIEBlock *Target; // the offset within Target is at Data[Offset...Offset+Size]
> };
> 
> // This would be used to represent things like pointers to .debug_loc/.debug_str or to functions/globals
> struct ExternalReloc {
>  size_t Offset; // offset within DIEBlock::Data
>  MCSymbol *Target; // the offset within Target is at Data[Offset...Offset+Size]
> };
> 
> struct DwarfBuilder {
>  DIEBlock *First;
>  DIEBlock *Cur;
>  DenseMap<DISubprogram *, DIEBlock *> Subprograms;
>  DenseMap<DIType *, DIEBlock *> Types;
>  DwarfBuilder() : First(new DIEBlock), Cur(First) {}
>  // builder implementation goes here...
> };
> 
> Normally, the DwarfBuilder will just emit bytes to Cur->Data (with possibly
> internal or external relocations to IntRelocs/ExtRelocs), but if it ever
> needs to create a "gap" for an incomplete data structure (e.g. at the end of a
> subprogram or a struct type), it will create a new DIEBlock New, store it to
> Cur->Next, store Cur in a DenseMap associated with the subprogram/type/etc
> and store New to Cur. To fill a gap later, the DwarfBuilder can pull the
> DIEBlock out of the DenseMap and start appending there. Once the IR is fully
> visited, the debug info writer will walk the linked list starting at First,
> calculate a byte offset for each DIEBlock, apply any internal relocations
> and write Data using the AsmPrinter (e.g. using EmitBytes, or maybe some
> other new interface that also supports relocations and avoids copying).
> 
> Does that sound reasonable? Is there anything I haven't accounted for?
> 
> Thanks,
> -- 
> Peter
> 
> [1] https://code.google.com/p/chromium/issues/detail?id=583551#c15
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev