[llvm-dev] RFC: Reducing Instr PGO size overhead

Fri Sep 4 12:27:21 PDT 2015

On Thu, Sep 3, 2015 at 10:26 PM, Xinliang David Li via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> LLVM Profile instrumentation incurs very large size (memory, storage)
> overhead. For instance, the following is the binary size comparison of
> the Clang binaries built  (-O2 -DNDEBUG) with different
> configurations:
>
>
> 1) 60.9M  (built with Clang itself)
> 2) 280.4M (same as 1, but added -fprofile-instr-generate)
> 3) 54.9M (built with GCC 4.8)
> 4) 156.5M (same as 3, but added -fprofile-generate)
>
> In other words, Clang's instrumentation increases binary size by 4.6X,
> while GCC's instrumentation overhead is only  2.8X.
>
> The profile data size from Clang instrumented binary is also much
> larger. The following is a comparison:
>
> 1) 114.5M (raw profile data emitted by Clang instrumented by Clang)
> 2) 63.7M (indexed profile data emitted by Clang instrumented by Clang)
> 3) 33.1M (total size of GCDA files emitted by Clang instrumented by GCC).
>
> Large size of overhead can limit the usability of PGO greatly:
> a) devices with small system partition does not have much room left --
> greatly bloated image due to instrumentation can prevent it from being
> installed
> b) Linker can be highly stressed when linking very large C++
> applications -- large size increase due to instrumentation can prevent
> those apps from being successfully linked
> c) Large raw profile dumps may also cause problems, e.g. running out
> of space. It can also profile bootstrap build really slow.
>
>
> Looking at various sources of size increase caused by instrumentation,
> it is clear that the biggest contributor is the __llvm_prf_names
> section. The current PGO implementation uses function  assembly names
> as the lookup keys in the indexed format so it needs to be emitted in
> both rodata of the binary and in the raw/indexed profiles.
>
> On the other hand, LLVM's indexed format also supports 'key collision'
> -- it allows functions with the same key share the same entry in the
> hash table. Function's structural hash values will be used as a
> secondary key to find a real match.
>
> The above feature allows us to use a different key other than function
> assembly names and this can reduce binary/profile data size
> significantly.  Function name's MD5 hash is a good candidate, and I
> have a patch (3 parts: runtime, reader/writer, clang) that does just
> that. The results are very promising. With this change, the clang
> instrumented binary size is now 216M (down from 280M); the raw profile
> size is only 40.1M (a 2.85X size reduction); and the indexed profile
> size is only 29.5M (a 2.2X size reduction).
>
> With the change, the indexed format size is smaller than GCC's (but
> the latter has value profile data).  The binary size is still much
> larger than GCC's, but with the late instrumentation, we will have
> more size reduction.
>
> A couple of more details of the patch:
>
> 1) When -fcoverage-mapping is specified, the llvm_prf_names will be
> emitted to the binary, but they won't be dumped into the profile data.
> In other words, with -fcoverage-mapping, only profile data will be
> shrinked.   The reason is that llvm-cov tool reads function names from
> the section (referenced from covmap) to support name based filtering
> (including regular expression) when dumping line coverage report
> 2) The change is backward compatible such that old version of both raw
> and index formats  can still be read by the new profile reader (and
> therefore clients such as clang, llvm-profdata, llvm-cov tools)
>
>
> I plan to submit the patch for review after some cleanups.
>
> Thoughts, concerns?
>

I think it is reasonable to simply replace the key we currently use with
MD5(key) for getting a size reduction.  In practice for my use cases, I
have not observed any of the issues you mentioned under "Large size of
overhead can limit the usability of PGO greatly", but I can understand that
some of these issues could become problems in Google's use case. I would
personally prefer to keep the existing behavior as the default (see below),
and have MD5(key) as an option.

My primary concern is that if the function name are not kept at all stages,
then it becomes difficult to analyze the profile data in a standalone way.
Many times, I have used `llvm-profdata show -all-functions foo.profdata` on
the resulting profile data and then imported that data into Mathematica for
analysis. My understanding of your proposal is that `llvm-profdata show
-all-functions foo.profdata` will not show the actual function names but
instead MD5 hashes, which will make it more difficult for me to do this
kind of analysis (would require using nm on the original binary, hashing
everything, etc.).

btw, feel free to attach the patch even if it in a rough state. It can
still help to clarify the proposal and be a good talking point.
Fine-grained patch review for caring about the rough parts will happen on
llvm-commits; the rough parts will not distract the discussion here on
llvm-dev.

-- Sean Silva

>
> thanks,
>
> David
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150904/e13df04e/attachment.html>