[llvm-dev] Reducing DWARF emitter memory consumption

Fri Feb 5 18:17:19 PST 2016

> On Feb 5, 2016, at 6:14 PM, Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
>> 
>> On Feb 5, 2016, at 6:02 PM, David Blaikie <dblaikie at gmail.com <mailto:dblaikie at gmail.com>> wrote:
>> 
>> 
>> 
>> On Fri, Feb 5, 2016 at 5:56 PM, Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>>> On Feb 5, 2016, at 5:53 PM, Eric Christopher <echristo at gmail.com <mailto:echristo at gmail.com>> wrote:
>>> 
>>> 
>>> 
>>> On Fri, Feb 5, 2016 at 5:51 PM Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>> 
>>> > On Feb 5, 2016, at 5:40 PM, Peter Collingbourne <peter at pcc.me.uk <mailto:peter at pcc.me.uk>> wrote:
>>> >
>>> > On Fri, Feb 05, 2016 at 04:58:45PM -0800, Mehdi Amini wrote:
>>> >>
>>> >>> On Feb 5, 2016, at 3:17 PM, Peter Collingbourne via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>>> >>>
>>> >>> Hi all,
>>> >>>
>>> >>> We have profiled [1] the memory usage in LLVM when LTO'ing Chromium, and
>>> >>> we've found that one of the top consumers of memory is the DWARF emitter in
>>> >>> lib/CodeGen/AsmPrinter/Dwarf*.
>>> >>
>>> >> I'm staring at the profile attached to the post #15 on the link you posted, can you confirm that the Dwarf emitter accounts for 6.7%+15.6%=22.3% of the the total allocated memory?
>>> >> If I understand correctly the numbers, this does not tell anything about how much the Dwarf emitter accounts on the *peak memory* usage (could be more, could be nothing...).
>>> >
>>> > I think these nodes represent allocations from the DWARF emitter:
>>> >
>>> > DwarfDebug::DwarfDebug 9.5%
>>> > DwarfDebug::endFunction 15.6%
>>> > DIEValueList::addValue 9.1%
>>> > total 34.2%
>>> >
>>> > I believe they are totals, but my reading of the code is that the DWARF
>>> > emitter does not deallocate its memory until the end of code generation,
>>> 
>>> That's sad :(
>>> 
>>> > so total ~= peak in this case.
>>> 
>>> Assuming the peak occurs during CodeGen (which is what I on my profile), that sounds pretty reasonable!
>>> 
>>> Thanks for the information (and the work!).
>>> 
>>> Another question I have, is how worse the split codegen make the situation? Naively there will be a lot of redundancy in the split modules, for ThinLTO Teresa has to proceed with care to limit the amount of duplication.
>>> 
>>> 
>>> Hmm? Can you reword this slightly? I'm not sure what you're asking here.
>> 
>> The parallel split codegen will take the big LTO module with all the debug info and produce multiple modules.
>> When splitting in multiple modules, you may have functions from the same DICompileUnit ending up in multiple modules.  All the retained types would be pulled in.
>>  
>> (this is assuming you are already taking care of not pulling the DICompileUnit when no functions referencing it is in the split module).
>> Then each thread would do redundant work processing this type hierarchy (and other debug info).
>> 
>> For ThinLTO, Teresa is taking care (review waiting here: http://reviews.llvm.org/D16440 <http://reviews.llvm.org/D16440> ) to try to import as little as possible, and turn type definition into declaration when possible.
>> 
>> Right - I don't think we'd ever need to import a definition - just rely on the fact that we will produce a type definition somewhere in the output (this may present problems for LLDB - it's certainly had issues with type declarations appearing where it would expect a definition (eg: a type that inherits from a declaration instead of a definition) not sure if that problem extends to the case of by-value function parameters)
>> 
>> So the impact of that cross-module importuing should be pretty low for ThinLTO. But the benefit of any work Peter does should be equally beneficial to ThinLTO, since it still has to emit the types, build all the DIEs, etc, etc.
> 
> I'm not sure if you really answered my question though, I may misunderstand what you mean here.
> 
> I'm not concerned about ThinLTO, any improvement on the DwarfEmitter would be beneficial for any CodeGen. I'll try to make my question more clear:
> 
> There is a "parallel code generator" for LTO that was added by Peter especially to address Chrome LTO builds. I *assume* the memory consumption measure we are talking about is using this scheme (it not mentioned how many threads).
> 
> When using the multi-threaded codegen, my concern would be that your 24 threads (random number here...) may emitting the same Dwarf informations again and again, which would make the 30% memory usage not surprising. Since we noticed this has a huge impact on ThinLTO, I was pointing an *orthogonal* way of addressing the memory concern for Chrome LTO.

Update: Peter told me on IRC that he believes the measure was made with single-threaded codegen. I wonder how worse the number would be with threading enabled :)

-- 
Mehdi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160205/3deaaa06/attachment.html>