[llvm-dev] Add support for in-process profile merging in profile-runtime

Sun Feb 28 00:16:41 PST 2016

On Sat, Feb 27, 2016 at 8:14 PM, Hal Finkel <hfinkel at anl.gov> wrote:

> ----- Original Message -----
> > From: "Sean Silva via llvm-dev" <llvm-dev at lists.llvm.org>
> > To: "Xinliang David Li" <davidxl at google.com>
> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> > Sent: Saturday, February 27, 2016 8:50:05 PM
> > Subject: Re: [llvm-dev] Add support for in-process profile merging in
> profile-runtime
> >
> >
> >
> > I have thought about this issue too, in the context of games. We may
> > want to turn profiling only for certain frames (essentially, this is
> > many small profile runs).
> >
> >
> > However, I have not seen it demonstrated that this kind of refined
> > data collection will actually improve PGO results in practice.
> > The evidence I do have though is that IIRC Apple have found that
> > almost all of the benefits of PGO for the Clang binary can be gotten
> > with a handful of training runs of Clang. Are your findings
> > different?
> >
> >
> > Also, in general, I am very wary of file locking.
>
> As am I (especially since it often does not operate correctly, or is very
> slow, on distributed file systems).

Dumping thousands of copies of profiles can be more problematic IMO.

> Why don't you just read in an existing file to pre-populate the counters
> section when it exists at startup?
>

No this won't work for cases when multiple processes are dumping profile
concurrently.

David

>
>  -Hal
>
> > This can cause huge
> > amounts of slowdown for a build and has potential portability
> > problems. I don't see it as a substantially better solution than
> > wrapping clang in a script that runs clang and then just calls
> > llvm-profdata to do the merging. Running llvm-profdata is cheap
> > compared to doing locking in a highly parallel situation like a
> > build.
> >
> >
> >
> >
> >
> > -- Sean Silva
> >
> >
> > On Sat, Feb 27, 2016 at 6:02 PM, Xinliang David Li via llvm-dev <
> > llvm-dev at lists.llvm.org > wrote:
> >
> >
> >
> > One of the main missing features in Clang/LLVM profile runtime is the
> > lack of support for online/in-process profile merging support.
> > Profile data collected for different workloads for the same
> > executable binary need to be collected and merged later by the
> > offline post-processing tool. This limitation makes it hard to
> > handle cases where the instrumented binary needs to be run with
> > large number of small workloads, possibly in parallel. For instance,
> > to do PGO for clang, we may choose to build a large project with the
> > instrumented Clang binary. This is because
> > 1) to avoid profile from different runs from overriding others, %p
> > substitution needs to be specified in either the command line or an
> > environment variable so that different process can dump profile data
> > into its own file named using pid. This will create huge requirement
> > on the disk storage. For instance, clang's raw profile size is
> > typically 80M -- if the instrumented clang is used to build a medium
> > to large size project (such as clang itself), profile data can
> > easily use up hundreds of Gig bytes of local storage.
> > 2) pid can also be recycled. This means that some of the profile data
> > may be overridden without being noticed.
> >
> >
> > The way to solve this problem is to allow profile data to be merged
> > in process. I have a prototype implementation and plan to send it
> > out for review soon after some clean ups. By default, the profiling
> > merging is off and it can be turned on with an user option or via an
> > environment variable. The following summarizes the issues involved
> > in adding this feature:
> > 1. the target platform needs to have file locking support
> > 2. there needs an efficient way to identify the profile data and
> > associate it with the binary using binary/profdata signature;
> > 3. Currently without merging, profile data from shared libraries
> > (including dlopen/dlcose ones) are concatenated into the primary
> > profile file. This can complicate matters, as the merger also needs
> > to find the matching shared libs, and the merger also needs to avoid
> > unnecessary data movement/copy;
> > 4. value profile data is variable in length even for the same binary.
> >
> >
> > All the above issues are resolved and clang self build with
> > instrumented binary passes (with both j1 and high parallelism).
> >
> >
> > If you have any concerns, please let me know.
> >
> >
> > thanks,
> >
> >
> > David
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160228/61880fa6/attachment.html>