[llvm-dev] Add support for in-process profile merging in profile-runtime

Sat Feb 27 23:44:31 PST 2016

Xinliang David Li via llvm-dev <llvm-dev at lists.llvm.org> writes:
> One of the main missing features in Clang/LLVM profile runtime is the lack of
> support for online/in-process profile merging support. Profile data collected
> for different workloads for the same executable binary need to be collected
> and merged later by the offline post-processing tool.  This limitation makes
> it hard to handle cases where the instrumented binary needs to be run with
> large number of small workloads, possibly in parallel.  For instance, to do
> PGO for clang, we may choose to  build  a large project with the instrumented
> Clang binary. This is because
>
> 1) to avoid profile from different runs from overriding others, %p
>    substitution needs to be specified in either the command line or an
>    environment variable so that different process can dump profile data
>    into its own file named using pid.

... or you can specify a more specific name that describes what's under
test, instead of %p.

>    This will create huge requirement on the disk storage. For
>    instance, clang's raw profile size is typically 80M -- if the
>    instrumented clang is used to build a medium to large size project
>    (such as clang itself), profile data can easily use up hundreds of
>    Gig bytes of local storage.

This argument is kind of confusing. It says that one profile is
typicially 80M, then claims that this uses 100s of GB of data. From
these statements that only makes sense I suppose that's true if you run
1000 profiling runs without merging the data in between. Is that what
you're talking about, or did I miss something?

> 2) pid can also be recycled. This means that some of the profile data may be
>    overridden without being noticed.
>
> The way to solve this problem is to allow profile data to be merged in
> process.

I'm not convinced. Can you provide some more concrete examples of where
the out of process merging model fails? This was a *very deliberate*
design decision in how clang's profiling works, and most of the
subsequent decisions have been based on this initial one. Changing it
has far reaching effects.

> I have a prototype implementation and plan to send it out for review
> soon after some clean ups. By default, the profiling merging is off and it can
> be turned on with an user option or via an environment variable. The following
> summarizes the issues involved in adding this feature:
> 1. the target platform needs to have file locking support
> 2. there needs an efficient way to identify the profile data and associate it
>    with the binary using binary/profdata signature;
> 3. Currently without merging, profile data from shared libraries
>    (including dlopen/dlcose ones) are concatenated into the primary
>    profile file. This can complicate matters, as the merger also needs to
>    find the matching shared libs, and the merger also needs to avoid
>    unnecessary data movement/copy;
> 4. value profile data is variable in length even for the same binary.

If we actually want this, we should reconsider the design of having a
raw vs processed profiling format. The raw profile format is
specifically designed to be fast to write out and not to consider
merging profiles at all. This feature would make it nearly as
complicated as the processed format and lose all of the advantages of
making them different.

> All the above issues are resolved and clang self build with instrumented
> binary passes (with both j1 and high parallelism). 
>
> If you have any concerns, please let me know.