[llvm-dev] [RFC] Order File Instrumentation

Manman Ren via llvm-dev llvm-dev at lists.llvm.org
Fri Jan 18 15:56:15 PST 2019


Some background information first, then a quick summary of what we have
discussed so far!

Background: Facebook app is one of the biggest iOS apps. Because of this,
we want the instrumentation to be as lightweight as possible in terms of
binary size, profile data size, and runtime performance. The plan to
improve Facebook app start up time is to (1) implement order file
instrumentation to be as light as possible, (2) push the order file
instrumentation to internal users first, and then to external beta users if
the overhead is low, (3) enable PGO instrumentation to collect information
to guide hot/cold splitting, and (4) push PGO instrumentation to internal
users.

There are a few alternatives we have discussed:
(A) What is proposed in the initial email: Log (module id, function id)
into a circular buffer in its own profile section when a function is first
executed.

(B) Re-use existing infra of a per function counter to record the timestamp
when a function is first executed
Compared to option (A), the runtime overhead for option (B) should be
higher since we will be calling timestamp for each function that is
executed at startup time, and the binary and the profile data will be
larger since it needs one number for each function plus additional overhead
in the per-function metadata recorded in llvm_prf_data. The buffer size for
option (A) is controllable, it needs to be the number of functions executed
at startup.

For the Facebook app, we expect that the number of functions executed
during startup is 1/3 to 1/2 of all functions in the binary. Profile data
size is important since we need to upload the profile data from device to
server.

The plus side is to reuse the existing infra!

In terms of integration with PGO instrumentation, both (A) and (B) should
work. For (B), we need to increase the number of per function counters by
one. For (A), they will be in different sections.

(C) XRay
We have not looked into this, but would like to hear more about it!

(D) -finstrument-functions-after-inlining or
-finstrument-function-entry-bare
We are worried about the runtime overhead of calling a separate function
when starting up the App.

Thanks,
Manman

On Fri, Jan 18, 2019 at 2:01 PM Chris Bieneman <chris.bieneman at me.com>
wrote:

> I would love to see this kind of order profiling support. Using dtrace to
> generate function orders is actually really problematic because dtrace made
> tradeoffs in implementation allowing it to ignore probe execution if the
> performance impact is too great on the system. This can result in dtrace
> being non-deterministic which is not ideal for generating optimization data.
>
> Additionally if order generation could be enabled at the same time as PGO
> generation that would be a great solution for generating profile data for
> optimizing clang itself. Clang has some scripts and build-system goop under
> utils/perf-training that can generate order files using dtrace and PGO
> data, it would be great to apply this technique to those tools too.
>
> -Chris
>
> > On Jan 18, 2019, at 2:43 AM, Hans Wennborg via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >
> > On Thu, Jan 17, 2019 at 7:24 PM Manman Ren via llvm-dev
> > <llvm-dev at lists.llvm.org> wrote:
> >>
> >> Order file is used to teach ld64 how to order the functions in a
> binary. If we put all functions executed during startup together in the
> right order, we will greatly reduce the page faults during startup.
> >>
> >> To generate order file for iOS apps, we usually use dtrace, but some
> apps have various startup scenarios that we want to capture in the order
> file. dtrace approach is not easy to automate, it is hard to capture the
> different ways of starting an app without automation. Instrumented builds
> however can be deployed to phones and profile data can be automatically
> collected.
> >>
> >> For the Facebook app, by looking at the startup distribution, we are
> expecting a big win out of the order file instrumentation, from 100ms to
> 500ms+, in startup time.
> >>
> >> The basic idea of the pass is to use a circular buffer to log the
> execution ordering of the functions. We only log the function when it is
> first executed. Instead of logging the symbol name of the function, we log
> a pair of integers, with one integer specifying the module id, and the
> other specifying the function id within the module.
> >
> > [...]
> >
> >> clang has '-finstrument-function-entry-bare' which inserts a function
> call and is not as efficient.
> >
> > Can you elaborate on why this existing functionality is not efficient
> > enough for you?
> >
> > For Chrome on Windows, we use -finstrument-functions-after-inlining to
> > insert calls at function entry (after inlining) that calls a function
> > which captures the addresses in a buffer, and later symbolizes and
> > dumps them to an order file that we feed the linker. We use a similar
> > approach on for Chrome on Android, but I'm not as familiar with the
> > details there.
> >
> > Thanks,
> > Hans
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190118/063bc728/attachment.html>


More information about the llvm-dev mailing list