[LLVMdev] RFC: Reduce the memory footprint of DIEs (and DIEValues)
Duncan P. N. Exon Smith
dexonsmith at apple.com
Wed May 20 17:39:23 PDT 2015
With just those four patches, memory usage went *up* slightly. Add in
the 5th patch (which does #2 below), and we get an overall memory drop
of 4%.
The intermediate result of a memory increase makes sense. While the
first four patches reduce the number of (and size of) `DIEValue`
allocations, they increase the cost of the `SmallVector` overhead.
0005 (attached) squeezes the abbreviation data into `DIEValue` for
free, next to the discriminator for the union. The 5 patches together
are strictly an improvement to memory usage.
It's nice to see the 4% memory drop, but this is all prep work for #3,
where I expect the biggest memory usage improvements.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0005-WIP-Store-abbreviation-data-directly-in-DIEValue.patch
Type: application/octet-stream
Size: 25110 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150520/3f1b2889/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: all-2.patch
Type: application/octet-stream
Size: 84163 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150520/3f1b2889/attachment-0001.obj>
-------------- next part --------------
> On 2015 May 20, at 15:56, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:
>
> To make this a little more concrete, I just hacked up a couple of
> patches that achieve step #1. (0004 is the key patch, and probably
> should be split up somehow before commit.) I'll collect some
> results and report back.
>
>
> <all.patch><0001-CodeGen-Remove-redundant-DIETypeSignature-dump.patch><0002-CodeGen-Remove-the-vtable-entry-from-DIEValue.patch><0003-CodeGen-Make-DIEValue-Ty-private-NFC.patch><0004-WIP-Change-DIEValue-to-be-stored-by-value.patch>
>
>> On 2015 May 20, at 11:28, Duncan P. N. Exon Smith <duncan at exonsmith.com> wrote:
>>
>> Pete Cooper and I have been looking at memory profiles of running llc on
>> verify-uselistorder.lto.opt.bc (ld -save-temps dump just before CodeGen
>> of building verify-uselistorder with -flto -g). I've attached
>> leak-backend.patch, which we're using to make Intrustruments more
>> accurate (instead of effectively leaking things onto BumpPtrAllocators,
>> really leak them with malloc()). (I've collected this data on top of a
>> few not-yet-committed patches to cheapen `MCSymbol` and
>> `EmitLabelDifference()` that chop around 8% of memory off the top, but
>> otherwise these numbers should be reproducible in ToT.)
>>
>> The `DIE` class is huge. Directly, it accounts for about 15% of backend
>> memory:
>>
>> Bytes Used Count Symbol Name
>> 77.87 MB 8.4% 318960 llvm::DwarfUnit::createAndAddDIE(unsigned int, llvm::DIE&, llvm::DINode const*)
>> 46.34 MB 5.0% 189810 llvm::DwarfCompileUnit::constructVariableDIEImpl(llvm::DbgVariable const&, bool)
>> 25.57 MB 2.7% 104752 llvm::DwarfCompileUnit::constructInlinedScopeDIE(llvm::LexicalScope*)
>> 8.19 MB 0.8% 33547 llvm::DwarfCompileUnit::constructImportedEntityDIE(llvm::DIImportedEntity const*)
>>
>> A lot of this is the pair of `SmallVector<, 12>` it has for its values
>> (look into `DIEAbbrev` for the second one). Here's a histogram of how
>> many DIEs have each value count:
>>
>> # of Values DIEs with # with # or fewer
>> 0 3128 3128
>> 1 109522 112650
>> 2 180382 293032
>> 3 90836 383868
>> 4 115552 499420
>> 5 90713 590133
>> 6 4125 594258
>> 7 17211 611469
>> 8 18144 629613
>> 9 22805 652418
>> 10 325 652743
>> 11 203 652946
>> 12 245 653191
>>
>> It's crazy that we're paying for 12 up front on every DIE. (This is
>> a reformatted version of num-values-with-totals.txt, which I've
>> attached along with a few other histograms Pete collected.)
>>
>> The `DIEValue`s themselves, which get leaked on the BumpPtrAllocator,
>> also take up a huge amount of memory (around 4%):
>>
>> Graph Category Persistent Bytes # Persistent # Transient Total Bytes # Total Transient/Total Bytes
>> 0 llvm::DIEInteger 19.91 MB 652389 0 19.91 MB 652389 <XRRatioObject: 0x608025658ea0> %0.00, %0.00
>> 0 llvm::DIEString 13.83 MB 302181 0 13.83 MB 302181 <XRRatioObject: 0x608025658ea0> %0.00, %0.00
>> 0 llvm::DIEEntry 10.91 MB 357506 0 10.91 MB 357506 <XRRatioObject: 0x608025658ea0> %0.00, %0.00
>> 0 llvm::DIEDelta 10.03 MB 328542 0 10.03 MB 328542 <XRRatioObject: 0x608025658ea0> %0.00, %0.00
>> 0 llvm::DIELabel 5.14 MB 168551 0 5.14 MB 168551 <XRRatioObject: 0x608025658ea0> %0.00, %0.00
>> 0 llvm::DIELoc 3.41 MB 13154 0 3.41 MB 13154 <XRRatioObject: 0x608025658ea0> %0.00, %0.00
>> 0 llvm::DIELocList 1.86 MB 61055 0 1.86 MB 61055 <XRRatioObject: 0x608025658ea0> %0.00, %0.00
>> 0 llvm::DIEBlock 11.69 KB 44 0 11.69 KB 44 <XRRatioObject: 0x608025658ea0> %0.00, %0.00
>> 0 llvm::DIEExpr 32 Bytes 1 0 32 Bytes 1 <XRRatioObject: 0x608025658ea0> %0.00, %0.00
>>
>> We can do better.
>>
>> 1. DIEValue should be a discriminated union that's passed by value
>> instead of pointer. Most types just have 1 pointer of data. There
>> are four "big" ones, which still need a side-allocation on the
>> BumpPtrAllocator: DIELoc, DIEBlock, DIEString, and DIEDelta.
>> Even for these, the side allocation just needs to store the data
>> itself (skipping the discriminator and the vtable entry).
>> 2. The contents of DIE's Abbrev field should be integrated with the
>> list of DIEValues. In particular, DIEValue should contain a
>> `dwarf::Form` and `dwarf::Attribute`. In total, `sizeof(DIEValue)`
>> will still be just two pointers (1st pointer: discriminator, Form,
>> and Attribute; 2nd pointer: data). DIE should stop storing a
>> `DIEAbbrev` itself, instead constructing one on demand, renaming
>> `DIE::getAbbrev()` to
>> `DIE::getOrCreateAbbrev(FoldingSet<DIEAbbrev>&)` or some such.
>> 3. DIE's list of DIEValues is currently a `SmallVector<, 12>`, but a
>> histogram Pete ran shows that half of DIEs have 2 or fewer values,
>> and 85% have 4 or fewer values. We're paying for 12 (!) upfront
>> right now for each DIE. Instead, we should optimize for 2-4
>> DIEValues. Not sure whether a std::forward_list would suffice, or if
>> we should do something fancy like:
>>
>> struct List {
>> DIEValue Values[2];
>> PointerIntPair<List *, 1> NextAndSize;
>> };
>>
>> Either way we should move the allocations to a BumpPtrAllocator
>> (trivial if it's a list instead of vector).
>> 4. `DIEBlock` and `DIELoc` inherit both from `DIEValue` and `DIE`, but
>> they're only ever used as the former. This is just a convenience
>> for building up and emitting their DIEValues. Now that we've trimmed
>> down and simplified that functionality in `DIE`, we can extract it
>> out and make it reusable -- `DIELoc` should "have-a" DIEValue list,
>> not "be-a" DIE.
>> 5. The children of DIE are stored in a `vector<unique_ptr<DIE>>`, which
>> requires side allocations. If we use an intrusively linked list,
>> it'll be easy to avoid side allocations without hitting the
>> pointer-validity problem highlighted in the header file.
>> 6. Now that DIE has no side allocations, we can move all the DIEs to a
>> BumpPtrAllocator and remove the malloc traffic.
>>
>> <leak-backend.patch><num-children-by-tag.txt><num-values-by-tag.txt><num-values-with-totals.txt><num-values.txt>
>
More information about the llvm-dev
mailing list