[llvm-dev] [RFC] Order File Instrumentation

Manman Ren via llvm-dev llvm-dev at lists.llvm.org
Fri Jan 18 21:10:48 PST 2019


On Fri, Jan 18, 2019 at 4:11 PM Xinliang David Li <davidxl at google.com>
wrote:

>
>
> On Fri, Jan 18, 2019 at 3:56 PM Manman Ren <manman.ren at gmail.com> wrote:
>
>> Some background information first, then a quick summary of what we have
>> discussed so far!
>>
>> Background: Facebook app is one of the biggest iOS apps. Because of this,
>> we want the instrumentation to be as lightweight as possible in terms of
>> binary size, profile data size, and runtime performance. The plan to
>> improve Facebook app start up time is to (1) implement order file
>> instrumentation to be as light as possible, (2) push the order file
>> instrumentation to internal users first, and then to external beta users if
>> the overhead is low, (3) enable PGO instrumentation to collect information
>> to guide hot/cold splitting, and (4) push PGO instrumentation to internal
>> users.
>>
>> There are a few alternatives we have discussed:
>> (A) What is proposed in the initial email: Log (module id, function id)
>> into a circular buffer in its own profile section when a function is first
>> executed.
>>
>> (B) Re-use existing infra of a per function counter to record the
>> timestamp when a function is first executed
>> Compared to option (A), the runtime overhead for option (B) should be
>> higher since we will be calling timestamp for each function that is
>> executed at startup time,
>>
>
> The 'timestamp' can be the just an global index. Since there is one
> counter per func, the counter can be initialized to be '-1' so that you
> don't need to use bitmap to track if the function has been invoked or not.
> In other words, the runtime overhead of B) could be lower :)
>

That actually works! We only care about the ordering of the functions. But
the concern on profile data size and binary size still exist :]

>
> David
>
>
>
>> and the binary and the profile data will be larger since it needs one
>> number for each function plus additional overhead in the per-function
>> metadata recorded in llvm_prf_data. The buffer size for option (A) is
>> controllable, it needs to be the number of functions executed at startup.
>>
>
Do you have a rough estimation on how much overhead the per-function
metadata is?

Manman

>
>> For the Facebook app, we expect that the number of functions executed
>> during startup is 1/3 to 1/2 of all functions in the binary. Profile data
>> size is important since we need to upload the profile data from device to
>> server.
>>
>> The plus side is to reuse the existing infra!
>>
>> In terms of integration with PGO instrumentation, both (A) and (B) should
>> work. For (B), we need to increase the number of per function counters by
>> one. For (A), they will be in different sections.
>>
>> (C) XRay
>> We have not looked into this, but would like to hear more about it!
>>
>> (D) -finstrument-functions-after-inlining or
>> -finstrument-function-entry-bare
>> We are worried about the runtime overhead of calling a separate function
>> when starting up the App.
>>
>> Thanks,
>> Manman
>>
>> On Fri, Jan 18, 2019 at 2:01 PM Chris Bieneman <chris.bieneman at me.com>
>> wrote:
>>
>>> I would love to see this kind of order profiling support. Using dtrace
>>> to generate function orders is actually really problematic because dtrace
>>> made tradeoffs in implementation allowing it to ignore probe execution if
>>> the performance impact is too great on the system. This can result in
>>> dtrace being non-deterministic which is not ideal for generating
>>> optimization data.
>>>
>>> Additionally if order generation could be enabled at the same time as
>>> PGO generation that would be a great solution for generating profile data
>>> for optimizing clang itself. Clang has some scripts and build-system goop
>>> under utils/perf-training that can generate order files using dtrace and
>>> PGO data, it would be great to apply this technique to those tools too.
>>>
>>> -Chris
>>>
>>> > On Jan 18, 2019, at 2:43 AM, Hans Wennborg via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>> >
>>> > On Thu, Jan 17, 2019 at 7:24 PM Manman Ren via llvm-dev
>>> > <llvm-dev at lists.llvm.org> wrote:
>>> >>
>>> >> Order file is used to teach ld64 how to order the functions in a
>>> binary. If we put all functions executed during startup together in the
>>> right order, we will greatly reduce the page faults during startup.
>>> >>
>>> >> To generate order file for iOS apps, we usually use dtrace, but some
>>> apps have various startup scenarios that we want to capture in the order
>>> file. dtrace approach is not easy to automate, it is hard to capture the
>>> different ways of starting an app without automation. Instrumented builds
>>> however can be deployed to phones and profile data can be automatically
>>> collected.
>>> >>
>>> >> For the Facebook app, by looking at the startup distribution, we are
>>> expecting a big win out of the order file instrumentation, from 100ms to
>>> 500ms+, in startup time.
>>> >>
>>> >> The basic idea of the pass is to use a circular buffer to log the
>>> execution ordering of the functions. We only log the function when it is
>>> first executed. Instead of logging the symbol name of the function, we log
>>> a pair of integers, with one integer specifying the module id, and the
>>> other specifying the function id within the module.
>>> >
>>> > [...]
>>> >
>>> >> clang has '-finstrument-function-entry-bare' which inserts a function
>>> call and is not as efficient.
>>> >
>>> > Can you elaborate on why this existing functionality is not efficient
>>> > enough for you?
>>> >
>>> > For Chrome on Windows, we use -finstrument-functions-after-inlining to
>>> > insert calls at function entry (after inlining) that calls a function
>>> > which captures the addresses in a buffer, and later symbolizes and
>>> > dumps them to an order file that we feed the linker. We use a similar
>>> > approach on for Chrome on Android, but I'm not as familiar with the
>>> > details there.
>>> >
>>> > Thanks,
>>> > Hans
>>> > _______________________________________________
>>> > LLVM Developers mailing list
>>> > llvm-dev at lists.llvm.org
>>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190118/d1fb5b48/attachment-0001.html>


More information about the llvm-dev mailing list