[llvm-dev] Memory utilization problems in profile reader

Fri Dec 11 19:19:06 PST 2015

On Fri, Dec 11, 2015 at 4:48 PM, Sean Silva <chisophugis at gmail.com> wrote:

>
>
> On Wed, Dec 9, 2015 at 12:14 PM, Xinliang David Li via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Can you extract the relevant part of the heap profile data?   How large
>> is the sample profile data fed to the compiler?
>>
>> The indexed format profile size for clang is <100MB.  The InstrProfRecord
>> for each function is read, used and discarded one at a time, so there
>> should not be problem  as described.
>>
>
> If I'm reading the code right, we are also doing O(keys of the hash table)
> memory allocation in the indexed reader here:
> http://llvm.org/docs/doxygen/html/classllvm_1_1InstrProfReaderIndex.html#acc49fd2c0a8c8dfc3e29b01e09869af7
> ?
> That seems unnecessary. (it seems to be used for value profiling stuff for
> some reason?)
>

It is for value profiling -- it is used to convert on-disk callee target
value (in md5) to unique string pointer when the function record's VP data
is read from memory. I will check its memory overhead at some point.  This
(the translation) is not strictly needed as a matter of fact (which I
actually wanted to get rid of, but did not find time to do yet -- it is on
my TODO list).

David

>
> -- Sean Silva
>
>
>>
>> David
>>
>>
>>
>> On Wed, Dec 9, 2015 at 7:52 AM, Diego Novillo via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>>
>>> I've been experimenting with profiled bootstraps using sample profiles.
>>> Initially, I made stage2 build stage3 while running under Perf.  This
>>> produced a 20Gb profile which took too long to convert to LLVM, and used
>>> ~30Gb of RAM.  So, I decided that this was not going to be very useful for
>>> general usage.
>>>
>>> I then changed the bootstrap to instead run each individual compile
>>> under Perf.  This produced ~2,200 profiles, each of which took up to 1
>>> minute to convert, and then they all have to be merged into a single
>>> profile.  Also didn't like it.
>>>
>>> Since all compiles are more or less the same in terms of what the
>>> compiler does, I decided to take the top 10 biggest profiles and merge
>>> those.  That seemed to work.  This resulted in a 21Mb profile that I could
>>> use as input to -fprofile-sample-use.
>>>
>>> I started stage 3 of the bootstrap and left it to work.  I noticed it
>>> was slow, so I thought "we'll need to speed things up".  The build never
>>> finished.  Instead, ninja crashed my machine.
>>>
>>> It turns out that each clang invocation was growing to 4Gb of RSS.  All
>>> that memory is being allocated by the profile reader (
>>> https://drive.google.com/file/d/0B9lq1VKvmXKFQVp1cGtZM2RSdWc/view?usp=sharing
>>> ).
>>>
>>> So, heads up, we need to trim it down.  Perhaps by only loading one
>>> function profile at a time, use it and actively discard it.  Or simply be
>>> better at flushing the reader data structures as they're used during
>>> annotations.  I'll be sending patches about this in the coming days.
>>>
>>> It's likely that the sample reader is doing something silly here.
>>> Duncan, Justin, do you have memories of issues like this one with
>>> instrumentation?  I'll be trying a similar experiment with it after I'm
>>> done with the biggest issues in the sampler.
>>>
>>>
>>> Thanks.  Diego.
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151211/281c4d1e/attachment-0001.html>