[llvm-dev] [RFC] Order File Instrumentation

Xinliang David Li via llvm-dev llvm-dev at lists.llvm.org
Fri Jan 18 16:11:46 PST 2019

On Fri, Jan 18, 2019 at 3:56 PM Manman Ren <manman.ren at gmail.com> wrote:

> Some background information first, then a quick summary of what we have
> discussed so far!
> Background: Facebook app is one of the biggest iOS apps. Because of this,
> we want the instrumentation to be as lightweight as possible in terms of
> binary size, profile data size, and runtime performance. The plan to
> improve Facebook app start up time is to (1) implement order file
> instrumentation to be as light as possible, (2) push the order file
> instrumentation to internal users first, and then to external beta users if
> the overhead is low, (3) enable PGO instrumentation to collect information
> to guide hot/cold splitting, and (4) push PGO instrumentation to internal
> users.
> There are a few alternatives we have discussed:
> (A) What is proposed in the initial email: Log (module id, function id)
> into a circular buffer in its own profile section when a function is first
> executed.
> (B) Re-use existing infra of a per function counter to record the
> timestamp when a function is first executed
> Compared to option (A), the runtime overhead for option (B) should be
> higher since we will be calling timestamp for each function that is
> executed at startup time,

The 'timestamp' can be the just an global index. Since there is one counter
per func, the counter can be initialized to be '-1' so that you don't need
to use bitmap to track if the function has been invoked or not.  In other
words, the runtime overhead of B) could be lower :)


> and the binary and the profile data will be larger since it needs one
> number for each function plus additional overhead in the per-function
> metadata recorded in llvm_prf_data. The buffer size for option (A) is
> controllable, it needs to be the number of functions executed at startup.
> For the Facebook app, we expect that the number of functions executed
> during startup is 1/3 to 1/2 of all functions in the binary. Profile data
> size is important since we need to upload the profile data from device to
> server.
> The plus side is to reuse the existing infra!
> In terms of integration with PGO instrumentation, both (A) and (B) should
> work. For (B), we need to increase the number of per function counters by
> one. For (A), they will be in different sections.
> (C) XRay
> We have not looked into this, but would like to hear more about it!
> (D) -finstrument-functions-after-inlining or
> -finstrument-function-entry-bare
> We are worried about the runtime overhead of calling a separate function
> when starting up the App.
> Thanks,
> Manman
> On Fri, Jan 18, 2019 at 2:01 PM Chris Bieneman <chris.bieneman at me.com>
> wrote:
>> I would love to see this kind of order profiling support. Using dtrace to
>> generate function orders is actually really problematic because dtrace made
>> tradeoffs in implementation allowing it to ignore probe execution if the
>> performance impact is too great on the system. This can result in dtrace
>> being non-deterministic which is not ideal for generating optimization data.
>> Additionally if order generation could be enabled at the same time as PGO
>> generation that would be a great solution for generating profile data for
>> optimizing clang itself. Clang has some scripts and build-system goop under
>> utils/perf-training that can generate order files using dtrace and PGO
>> data, it would be great to apply this technique to those tools too.
>> -Chris
>> > On Jan 18, 2019, at 2:43 AM, Hans Wennborg via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>> >
>> > On Thu, Jan 17, 2019 at 7:24 PM Manman Ren via llvm-dev
>> > <llvm-dev at lists.llvm.org> wrote:
>> >>
>> >> Order file is used to teach ld64 how to order the functions in a
>> binary. If we put all functions executed during startup together in the
>> right order, we will greatly reduce the page faults during startup.
>> >>
>> >> To generate order file for iOS apps, we usually use dtrace, but some
>> apps have various startup scenarios that we want to capture in the order
>> file. dtrace approach is not easy to automate, it is hard to capture the
>> different ways of starting an app without automation. Instrumented builds
>> however can be deployed to phones and profile data can be automatically
>> collected.
>> >>
>> >> For the Facebook app, by looking at the startup distribution, we are
>> expecting a big win out of the order file instrumentation, from 100ms to
>> 500ms+, in startup time.
>> >>
>> >> The basic idea of the pass is to use a circular buffer to log the
>> execution ordering of the functions. We only log the function when it is
>> first executed. Instead of logging the symbol name of the function, we log
>> a pair of integers, with one integer specifying the module id, and the
>> other specifying the function id within the module.
>> >
>> > [...]
>> >
>> >> clang has '-finstrument-function-entry-bare' which inserts a function
>> call and is not as efficient.
>> >
>> > Can you elaborate on why this existing functionality is not efficient
>> > enough for you?
>> >
>> > For Chrome on Windows, we use -finstrument-functions-after-inlining to
>> > insert calls at function entry (after inlining) that calls a function
>> > which captures the addresses in a buffer, and later symbolizes and
>> > dumps them to an order file that we feed the linker. We use a similar
>> > approach on for Chrome on Android, but I'm not as familiar with the
>> > details there.
>> >
>> > Thanks,
>> > Hans
>> > _______________________________________________
>> > LLVM Developers mailing list
>> > llvm-dev at lists.llvm.org
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190118/f33b41fc/attachment.html>

More information about the llvm-dev mailing list