[llvm-dev] Add support for in-process profile merging in profile-runtime

Sun Feb 28 10:59:37 PST 2016

On Sun, Feb 28, 2016 at 10:45 AM, Mehdi Amini <mehdi.amini at apple.com> wrote:

>
> On Feb 28, 2016, at 12:46 AM, Xinliang David Li via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Justin, looks like there is some misunderstanding in my email. I want to
> clarify it here first:
>
> 1) I am not proposing changing the default profile dumping model as used
> today. The online merging is totally optional;
> 2) the on-line profile merging is not doing conversion from raw to index
> format. It does very simple raw-to-raw merging using existing runtime APIs.
> 3) the change to existing profile runtime code is just a few lines. All
> the new functionality is isolated in one new file. It will become clear
> when the patch is sent out later.
>
> My inline replies below:
>
>
> On Sat, Feb 27, 2016 at 11:44 PM, Justin Bogner <mail at justinbogner.com>
> wrote:
>
>> Xinliang David Li via llvm-dev <llvm-dev at lists.llvm.org> writes:
>> > One of the main missing features in Clang/LLVM profile runtime is the
>> lack of
>> > support for online/in-process profile merging support. Profile data
>> collected
>> > for different workloads for the same executable binary need to be
>> collected
>> > and merged later by the offline post-processing tool.  This limitation
>> makes
>> > it hard to handle cases where the instrumented binary needs to be run
>> with
>> > large number of small workloads, possibly in parallel.  For instance,
>> to do
>> > PGO for clang, we may choose to  build  a large project with the
>> instrumented
>> > Clang binary. This is because
>> >
>> > 1) to avoid profile from different runs from overriding others, %p
>> >    substitution needs to be specified in either the command line or an
>> >    environment variable so that different process can dump profile data
>> >    into its own file named using pid.
>>
>> ... or you can specify a more specific name that describes what's under
>> test, instead of %p.
>>
>
> yes -- but the problem still exists -- each training process will need its
> own copy of raw profile.
>
>
>>
>> >    This will create huge requirement on the disk storage. For
>> >    instance, clang's raw profile size is typically 80M -- if the
>> >    instrumented clang is used to build a medium to large size project
>> >    (such as clang itself), profile data can easily use up hundreds of
>> >    Gig bytes of local storage.
>>
>> This argument is kind of confusing. It says that one profile is
>> typicially 80M, then claims that this uses 100s of GB of data. From
>> these statements that only makes sense I suppose that's true if you run
>> 1000 profiling runs without merging the data in between. Is that what
>> you're talking about, or did I miss something?
>>
>
> Yes. For instance,  first build a  clang with
> -fprofile-instr-generate=prof.data.%p, and use this instrumented clang to
> build another large project such as clang itself. The second build will
> produce tons of profile data.
>
>
>>
>> > 2) pid can also be recycled. This means that some of the profile data
>> may be
>> >    overridden without being noticed.
>> >
>> > The way to solve this problem is to allow profile data to be merged in
>> > process.
>>
>> I'm not convinced. Can you provide some more concrete examples of where
>> the out of process merging model fails? This was a *very deliberate*
>> design decision in how clang's profiling works, and most of the
>> subsequent decisions have been based on this initial one. Changing it
>> has far reaching effects.
>
>
> I am not proposing changing the out of process merging -- it is still
> needed. What I meant is that, in a scenario where the instrumented binaries
> are running multiple times (using their existing running harness), there is
> no good/automatic way of making sure different process's profile data won't
> have name conflict.
>
>
> Could the profile file be named from a hash of the profile data themselves?
>

That will solve the name conflict problem -- but the problem with larger
resource requirement is still there :)

>
> Even with that I still like the direction you're heading to. To avoid
> contention on the "file locking", could there be a pool of output files? I
> don't know how to map a new process to one output file without a lock
> somewhere though.
>

We can certainly introduce something like that when lock contention becomes
an issue (which is a great idea by the way, and quite straightforward to
implement). From my initial experiment, it does seem to be a problem.

thanks,

David

>
> --
> Mehdi
>
>
>
> Using clang's self build (using instrumented clang as build compiler for
> profile bootstrapping) as an example.  Ideally this should all be done
> transparently  -- i.e, set the instrumented compiler as the build compiler,
> run ninja or make and things will just work, but with the current default
> profile dumping mode, it can fail in many different ways:
> 1) Just run ninja/make -- all clang processes will dump profile into the
> same file concurrently -- the result is a corrupted profile -- FAIL
> 2) run ninja with LLVM_PROFILE_FILE=....%p
>    2.1) failure mode #1 --> really slow build due to large IO; or running
> out of diskspace
>    2.2) failure mode #2 --> pid recyling leading to profile file name
> conflict -- profile overwriting happens and we loss data
>
> Suppose 2) above finally succeeds, the user will have to merge thousands
> of raw profiles to indexed profile.
>
> With the proposed profile on-line merging, you just need to use the
> instrumented clang, and one merged raw profile data automagically produced
> in the end. The raw to indexed merge is also much faster.
>
> The online merge feature has a huge advantage when considering integrating
> the instrumented binary with existing make systems or loadtesting harness
> -- it is almost seamless.
>
>
>
>>
>> > I have a prototype implementation and plan to send it out for review
>> > soon after some clean ups. By default, the profiling merging is off and
>> it can
>> > be turned on with an user option or via an environment variable. The
>> following
>> > summarizes the issues involved in adding this feature:
>> > 1. the target platform needs to have file locking support
>> > 2. there needs an efficient way to identify the profile data and
>> associate it
>> >    with the binary using binary/profdata signature;
>> > 3. Currently without merging, profile data from shared libraries
>> >    (including dlopen/dlcose ones) are concatenated into the primary
>> >    profile file. This can complicate matters, as the merger also needs
>> to
>> >    find the matching shared libs, and the merger also needs to avoid
>> >    unnecessary data movement/copy;
>> > 4. value profile data is variable in length even for the same binary.
>>
>> If we actually want this, we should reconsider the design of having a
>> raw vs processed profiling format. The raw profile format is
>> specifically designed to be fast to write out and not to consider
>> merging profiles at all. This feature would make it nearly as
>> complicated as the processed format and lose all of the advantages of
>> making them different.
>>
>
> See above -- all the nice raw profile dumping mechanism is still kept --
> there won't be a change of that.
>
> thanks,
>
> David
>
>
>
>>
>> > All the above issues are resolved and clang self build with instrumented
>> > binary passes (with both j1 and high parallelism).
>> >
>> > If you have any concerns, please let me know.
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160228/bb49fa62/attachment.html>