[llvm-dev] Reducing DWARF emitter memory consumption

Fri Feb 5 18:16:30 PST 2016

Hi Mehdi,

On Fri, Feb 5, 2016 at 6:14 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:

>
> On Feb 5, 2016, at 6:02 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Fri, Feb 5, 2016 at 5:56 PM, Mehdi Amini via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>>
>> On Feb 5, 2016, at 5:53 PM, Eric Christopher <echristo at gmail.com> wrote:
>>
>>
>>
>> On Fri, Feb 5, 2016 at 5:51 PM Mehdi Amini via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>>
>>> > On Feb 5, 2016, at 5:40 PM, Peter Collingbourne <peter at pcc.me.uk>
>>> wrote:
>>> >
>>> > On Fri, Feb 05, 2016 at 04:58:45PM -0800, Mehdi Amini wrote:
>>> >>
>>> >>> On Feb 5, 2016, at 3:17 PM, Peter Collingbourne via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>> >>>
>>> >>> Hi all,
>>> >>>
>>> >>> We have profiled [1] the memory usage in LLVM when LTO'ing Chromium,
>>> and
>>> >>> we've found that one of the top consumers of memory is the DWARF
>>> emitter in
>>> >>> lib/CodeGen/AsmPrinter/Dwarf*.
>>> >>
>>> >> I'm staring at the profile attached to the post #15 on the link you
>>> posted, can you confirm that the Dwarf emitter accounts for
>>> 6.7%+15.6%=22.3% of the the total allocated memory?
>>> >> If I understand correctly the numbers, this does not tell anything
>>> about how much the Dwarf emitter accounts on the *peak memory* usage (could
>>> be more, could be nothing...).
>>> >
>>> > I think these nodes represent allocations from the DWARF emitter:
>>> >
>>> > DwarfDebug::DwarfDebug 9.5%
>>> > DwarfDebug::endFunction 15.6%
>>> > DIEValueList::addValue 9.1%
>>> > total 34.2%
>>> >
>>> > I believe they are totals, but my reading of the code is that the DWARF
>>> > emitter does not deallocate its memory until the end of code
>>> generation,
>>>
>>> That's sad :(
>>>
>>> > so total ~= peak in this case.
>>>
>>> Assuming the peak occurs during CodeGen (which is what I on my profile),
>>> that sounds pretty reasonable!
>>>
>>> Thanks for the information (and the work!).
>>>
>>> Another question I have, is how worse the split codegen make the
>>> situation? Naively there will be a lot of redundancy in the split modules,
>>> for ThinLTO Teresa has to proceed with care to limit the amount of
>>> duplication.
>>>
>>>
>> Hmm? Can you reword this slightly? I'm not sure what you're asking here.
>>
>>
>> The parallel split codegen will take the big LTO module with all the
>> debug info and produce multiple modules.
>> When splitting in multiple modules, you may have functions from the same
>> DICompileUnit ending up in multiple modules.  All the retained types would
>> be pulled in.
>>
>
>
>> (this is assuming you are already taking care of not pulling the
>> DICompileUnit when no functions referencing it is in the split module).
>> Then each thread would do redundant work processing this type hierarchy
>> (and other debug info).
>>
>> For ThinLTO, Teresa is taking care (review waiting here:
>> http://reviews.llvm.org/D16440 ) to try to import as little as possible,
>> and turn type definition into declaration when possible.
>>
>
> Right - I don't think we'd ever need to import a definition - just rely on
> the fact that we will produce a type definition somewhere in the output
> (this may present problems for LLDB - it's certainly had issues with type
> declarations appearing where it would expect a definition (eg: a type that
> inherits from a declaration instead of a definition) not sure if that
> problem extends to the case of by-value function parameters)
>
> So the impact of that cross-module importuing should be pretty low for
> ThinLTO. But the benefit of any work Peter does should be equally
> beneficial to ThinLTO, since it still has to emit the types, build all the
> DIEs, etc, etc.
>
>
> I'm not sure if you really answered my question though, I may
> misunderstand what you mean here.
>
> I'm not concerned about ThinLTO, any improvement on the DwarfEmitter would
> be beneficial for any CodeGen. I'll try to make my question more clear:
>
> There is a "parallel code generator" for LTO that was added by Peter
> especially to address Chrome LTO builds. I *assume* the memory consumption
> measure we are talking about is using this scheme (it not mentioned how
> many threads).
>
FYI: the measures I made were not using multiple threads, so no artificial
multiplication of memory usage presents.

>
> When using the multi-threaded codegen, my concern would be that your 24
> threads (random number here...) may emitting the same Dwarf informations
> again and again, which would make the 30% memory usage not surprising.
> Since we noticed this has a huge impact on ThinLTO, I was pointing an
> *orthogonal* way of addressing the memory concern for Chrome LTO.
>
> --
> Mehdi
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160205/c75f16d8/attachment.html>