[llvm-dev] [RFC] Order File Instrumentation

Manman Ren via llvm-dev llvm-dev at lists.llvm.org
Thu Jan 17 14:32:32 PST 2019


On Thu, Jan 17, 2019 at 10:53 AM Xinliang David Li <davidxl at google.com>
wrote:

> Hi Manman,
>
> Ordering profiling is certainly something very useful to have to startup
> time performance. GCC has something similar.
>
> In terms of implementation, it is possible to simply extend the edge
> profiling counters by 1 for each function, and instrument the function to
> record the time stamp the first time the function is executed. The overhead
> will be minimized and you can leverage all the other existing support in
> profiling runtime.
>

Hi David,

Just to clarify, are you suggesting to add an edge profiling counter per
function to record the time stamp? Where are the edge profiling counters
defined?

So the difference will be where we store the profile information and in
what format?

With the suggested approach, we need to allocate one time stamp for each
function, what is implemented is a pair of numbers for each executed
function. The runtime performance can be different as well, the suggested
approach gets the time stamp, and saves it to memory, what is implemented
is saving the pair of numbers and incrementing a counter.


> Another possibility is to use xray to implement the functionality -- xray
> is useful for trace like profiling by design.
>

We have not looked into XRay. We need something with low binary size
penalty and low runtime perf degradation, not sure if XRay is a good fit!

Thanks,
Manman


> David
>
> On Thu, Jan 17, 2019 at 10:24 AM Manman Ren <manman.ren at gmail.com> wrote:
>
>> Order file is used to teach ld64 how to order the functions in a binary.
>> If we put all functions executed during startup together in the right
>> order, we will greatly reduce the page faults during startup.
>>
>> To generate order file for iOS apps, we usually use dtrace, but some apps
>> have various startup scenarios that we want to capture in the order file.
>> dtrace approach is not easy to automate, it is hard to capture the
>> different ways of starting an app without automation. Instrumented builds
>> however can be deployed to phones and profile data can be automatically
>> collected.
>>
>> For the Facebook app, by looking at the startup distribution, we are
>> expecting a big win out of the order file instrumentation, from 100ms to
>> 500ms+, in startup time.
>>
>> The basic idea of the pass is to use a circular buffer to log the
>> execution ordering of the functions. We only log the function when it is
>> first executed. Instead of logging the symbol name of the function, we log
>> a pair of integers, with one integer specifying the module id, and the
>> other specifying the function id within the module.
>>
>> In this pass, we add three global variables:
>> (1) an order file buffer
>> The order file buffer is a circular buffer at its own llvm section. Each
>> entry is a pair of integers, with one integer specifying the module id, and
>> the other specifying the function id within the module.
>> (2) a bitmap for each module: one bit for each function to say if the
>> function is already executed;
>> (3) a global index to the buffer
>>
>> At the function prologue, if the function has not been executed (by
>> checking the bitmap), log the module id and the function id, then
>> atomically increase the index.
>>
>> This pass is intended to be used as a ThinLTO pass or a LTO pass. It maps
>> each module to a distinct integer, it also generate a mapping file so we
>> can decode the function symbol name from the pair of ids.
>>
>> clang has '-finstrument-function-entry-bare' which inserts a function
>> call and is not as efficient.
>>
>> Three patches are attached, for llvm, clang, and compiler-rt respectively.
>>
>> TODO:
>> (1) Migrate to the new pass manager with a shim for the legacy pass
>> manager.
>> (2) For the order file buffer, consider always emitting definitions,
>> making them LinkOnceODR with a COMDAT group.
>> (3) Add testing case for clang/compiler-rt patches.
>> (4) Add utilities to deobfuscate the profile dump.
>> (5) The size of the buffer is currently hard-coded (
>> INSTR_ORDER_FILE_BUFFER_SIZE).
>>
>> Thanks Kamal for contributing to the patches! Thanks to Aditya and Saleem
>> for doing an initial review pass over the patches!
>>
>> Manman
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190117/adbc4300/attachment.html>


More information about the llvm-dev mailing list