[llvm-dev] Add support for in-process profile merging in profile-runtime

Sat Feb 27 18:50:05 PST 2016

I have thought about this issue too, in the context of games. We may want
to turn profiling only for certain frames (essentially, this is many small
profile runs).

However, I have not seen it demonstrated that this kind of refined data
collection will actually improve PGO results in practice.
The evidence I do have though is that IIRC Apple have found that almost all
of the benefits of PGO for the Clang binary can be gotten with a handful of
training runs of Clang. Are your findings different?

Also, in general, I am very wary of file locking. This can cause huge
amounts of slowdown for a build and has potential portability problems. I
don't see it as a substantially better solution than wrapping clang in a
script that runs clang and then just calls llvm-profdata to do the merging.
Running llvm-profdata is cheap compared to doing locking in a highly
parallel situation like a build.

-- Sean Silva

On Sat, Feb 27, 2016 at 6:02 PM, Xinliang David Li via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> One of the main missing features in Clang/LLVM profile runtime is the lack
> of support for online/in-process profile merging support. Profile data
> collected for different workloads for the same executable binary need to be
> collected and merged later by the offline post-processing tool.  This
> limitation makes it hard to handle cases where the instrumented binary
> needs to be run with large number of small workloads, possibly in
> parallel.  For instance, to do PGO for clang, we may choose to  build  a
> large project with the instrumented Clang binary. This is because
>  1) to avoid profile from different runs from overriding others, %p
> substitution needs to be specified in either the command line or an
> environment variable so that different process can dump profile data into
> its own file named using pid. This will create huge requirement on the disk
> storage. For instance, clang's raw profile size is typically 80M -- if the
> instrumented clang is used to build a medium to large size project (such as
> clang itself), profile data can easily use up hundreds of Gig bytes of
> local storage.
> 2) pid can also be recycled. This means that some of the profile data may
> be overridden without being noticed.
>
> The way to solve this problem is to allow profile data to be merged in
> process.  I have a prototype implementation and plan to send it out for
> review soon after some clean ups. By default, the profiling merging is off
> and it can be turned on with an user option or via an environment variable.
> The following summarizes the issues involved in adding this feature:
>  1. the target platform needs to have file locking support
>  2. there needs an efficient way to identify the profile data and
> associate it with the binary using binary/profdata signature;
>  3. Currently without merging, profile data from shared libraries
> (including dlopen/dlcose ones) are concatenated into the primary profile
> file. This can complicate matters, as the merger also needs to find the
> matching shared libs, and the merger also needs to avoid unnecessary data
> movement/copy;
>  4. value profile data is variable in length even for the same binary.
>
> All the above issues are resolved and clang self build with instrumented
> binary passes (with both j1 and high parallelism).
>
> If you have any concerns, please let me know.
>
> thanks,
>
> David
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160227/402da960/attachment-0001.html>