[llvm-dev] [RFC] Order File Instrumentation

Xinliang David Li via llvm-dev llvm-dev at lists.llvm.org
Thu Jan 17 14:47:26 PST 2019


On Thu, Jan 17, 2019 at 2:32 PM Manman Ren <manman.ren at gmail.com> wrote:

>
>
> On Thu, Jan 17, 2019 at 10:53 AM Xinliang David Li <davidxl at google.com>
> wrote:
>
>> Hi Manman,
>>
>> Ordering profiling is certainly something very useful to have to startup
>> time performance. GCC has something similar.
>>
>> In terms of implementation, it is possible to simply extend the edge
>> profiling counters by 1 for each function, and instrument the function to
>> record the time stamp the first time the function is executed. The overhead
>> will be minimized and you can leverage all the other existing support in
>> profiling runtime.
>>
>
> Hi David,
>
> Just to clarify, are you suggesting to add an edge profiling counter per
> function to record the time stamp? Where are the edge profiling counters
> defined?
>

There is no needed to define the counter explicitly. What is needed is to
introduce a new intrinsic to update the order counter, and the InstProf
lowerer will create the counter for you. See
Transforms/Instrumentation/InstrProfiling.cpp.   +Vedant Kumar
<vsk at apple.com>


> So the difference will be where we store the profile information and in
> what format?
>
>
Time stamp can be simply a uint64 value just like any edge/block profile
count.   llvm_profdata tool can be taught to dump the ordering file to be
exported to the linker.


> With the suggested approach, we need to allocate one time stamp for each
> function, what is implemented is a pair of numbers for each executed
> function. The runtime performance can be different as well, the suggested
> approach gets the time stamp, and saves it to memory, what is implemented
> is saving the pair of numbers and incrementing a counter.
>
>
The runtime cost is probably not much for either approach. The suggested
approach eliminates the trouble to maintain/implement a different way to
identify functions.



>
>> Another possibility is to use xray to implement the functionality -- xray
>> is useful for trace like profiling by design.
>>
>
> We have not looked into XRay. We need something with low binary size
> penalty and low runtime perf degradation, not sure if XRay is a good fit!
>
>
xray can achieve very low runtime overhead. +Dean Michael Berris
<dberris at google.com> for additional comments.

David




> Thanks,
> Manman
>
>
>> David
>>
>> On Thu, Jan 17, 2019 at 10:24 AM Manman Ren <manman.ren at gmail.com> wrote:
>>
>>> Order file is used to teach ld64 how to order the functions in a binary.
>>> If we put all functions executed during startup together in the right
>>> order, we will greatly reduce the page faults during startup.
>>>
>>> To generate order file for iOS apps, we usually use dtrace, but some
>>> apps have various startup scenarios that we want to capture in the order
>>> file. dtrace approach is not easy to automate, it is hard to capture the
>>> different ways of starting an app without automation. Instrumented builds
>>> however can be deployed to phones and profile data can be automatically
>>> collected.
>>>
>>> For the Facebook app, by looking at the startup distribution, we are
>>> expecting a big win out of the order file instrumentation, from 100ms to
>>> 500ms+, in startup time.
>>>
>>> The basic idea of the pass is to use a circular buffer to log the
>>> execution ordering of the functions. We only log the function when it is
>>> first executed. Instead of logging the symbol name of the function, we log
>>> a pair of integers, with one integer specifying the module id, and the
>>> other specifying the function id within the module.
>>>
>>> In this pass, we add three global variables:
>>> (1) an order file buffer
>>> The order file buffer is a circular buffer at its own llvm section. Each
>>> entry is a pair of integers, with one integer specifying the module id, and
>>> the other specifying the function id within the module.
>>> (2) a bitmap for each module: one bit for each function to say if the
>>> function is already executed;
>>> (3) a global index to the buffer
>>>
>>> At the function prologue, if the function has not been executed (by
>>> checking the bitmap), log the module id and the function id, then
>>> atomically increase the index.
>>>
>>> This pass is intended to be used as a ThinLTO pass or a LTO pass. It
>>> maps each module to a distinct integer, it also generate a mapping file so
>>> we can decode the function symbol name from the pair of ids.
>>>
>>> clang has '-finstrument-function-entry-bare' which inserts a function
>>> call and is not as efficient.
>>>
>>> Three patches are attached, for llvm, clang, and compiler-rt
>>> respectively.
>>>
>>> TODO:
>>> (1) Migrate to the new pass manager with a shim for the legacy pass
>>> manager.
>>> (2) For the order file buffer, consider always emitting definitions,
>>> making them LinkOnceODR with a COMDAT group.
>>> (3) Add testing case for clang/compiler-rt patches.
>>> (4) Add utilities to deobfuscate the profile dump.
>>> (5) The size of the buffer is currently hard-coded (
>>> INSTR_ORDER_FILE_BUFFER_SIZE).
>>>
>>> Thanks Kamal for contributing to the patches! Thanks to Aditya and
>>> Saleem for doing an initial review pass over the patches!
>>>
>>> Manman
>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190117/4f56cd83/attachment.html>


More information about the llvm-dev mailing list