[llvm-dev] [RFC] Order File Instrumentation

Manman Ren via llvm-dev llvm-dev at lists.llvm.org
Thu Jan 17 15:24:49 PST 2019


On Thu, Jan 17, 2019 at 2:47 PM Xinliang David Li <davidxl at google.com>
wrote:

>
>
> On Thu, Jan 17, 2019 at 2:32 PM Manman Ren <manman.ren at gmail.com> wrote:
>
>>
>>
>> On Thu, Jan 17, 2019 at 10:53 AM Xinliang David Li <davidxl at google.com>
>> wrote:
>>
>>> Hi Manman,
>>>
>>> Ordering profiling is certainly something very useful to have to startup
>>> time performance. GCC has something similar.
>>>
>>> In terms of implementation, it is possible to simply extend the edge
>>> profiling counters by 1 for each function, and instrument the function to
>>> record the time stamp the first time the function is executed. The overhead
>>> will be minimized and you can leverage all the other existing support in
>>> profiling runtime.
>>>
>>
>> Hi David,
>>
>> Just to clarify, are you suggesting to add an edge profiling counter per
>> function to record the time stamp? Where are the edge profiling counters
>> defined?
>>
>
> There is no needed to define the counter explicitly. What is needed is to
> introduce a new intrinsic to update the order counter, and the InstProf
> lowerer will create the counter for you. See
> Transforms/Instrumentation/InstrProfiling.cpp.   +Vedant Kumar
> <vsk at apple.com>
>

We need the instrumentation to be run late as an IR pass, so it has all the
function symbols. Are you suggesting a front-end instrumentation by
mentioning InstrProfiling.cpp?


>
>
>> So the difference will be where we store the profile information and in
>> what format?
>>
>>
> Time stamp can be simply a uint64 value just like any edge/block profile
> count.   llvm_profdata tool can be taught to dump the ordering file to be
> exported to the linker.
>
>
>> With the suggested approach, we need to allocate one time stamp for each
>> function, what is implemented is a pair of numbers for each executed
>> function. The runtime performance can be different as well, the suggested
>> approach gets the time stamp, and saves it to memory, what is implemented
>> is saving the pair of numbers and incrementing a counter.
>>
>>
> The runtime cost is probably not much for either approach. The suggested
> approach eliminates the trouble to maintain/implement a different way to
> identify functions.
>

Agree that using existing infra is better!

Manman

>
>
>
>>
>>> Another possibility is to use xray to implement the functionality --
>>> xray is useful for trace like profiling by design.
>>>
>>
>> We have not looked into XRay. We need something with low binary size
>> penalty and low runtime perf degradation, not sure if XRay is a good fit!
>>
>>
> xray can achieve very low runtime overhead. +Dean Michael Berris
> <dberris at google.com> for additional comments.
>
> David
>
>
>
>
>> Thanks,
>> Manman
>>
>>
>>> David
>>>
>>> On Thu, Jan 17, 2019 at 10:24 AM Manman Ren <manman.ren at gmail.com>
>>> wrote:
>>>
>>>> Order file is used to teach ld64 how to order the functions in a
>>>> binary. If we put all functions executed during startup together in the
>>>> right order, we will greatly reduce the page faults during startup.
>>>>
>>>> To generate order file for iOS apps, we usually use dtrace, but some
>>>> apps have various startup scenarios that we want to capture in the order
>>>> file. dtrace approach is not easy to automate, it is hard to capture the
>>>> different ways of starting an app without automation. Instrumented builds
>>>> however can be deployed to phones and profile data can be automatically
>>>> collected.
>>>>
>>>> For the Facebook app, by looking at the startup distribution, we are
>>>> expecting a big win out of the order file instrumentation, from 100ms to
>>>> 500ms+, in startup time.
>>>>
>>>> The basic idea of the pass is to use a circular buffer to log the
>>>> execution ordering of the functions. We only log the function when it is
>>>> first executed. Instead of logging the symbol name of the function, we log
>>>> a pair of integers, with one integer specifying the module id, and the
>>>> other specifying the function id within the module.
>>>>
>>>> In this pass, we add three global variables:
>>>> (1) an order file buffer
>>>> The order file buffer is a circular buffer at its own llvm section.
>>>> Each entry is a pair of integers, with one integer specifying the module
>>>> id, and the other specifying the function id within the module.
>>>> (2) a bitmap for each module: one bit for each function to say if the
>>>> function is already executed;
>>>> (3) a global index to the buffer
>>>>
>>>> At the function prologue, if the function has not been executed (by
>>>> checking the bitmap), log the module id and the function id, then
>>>> atomically increase the index.
>>>>
>>>> This pass is intended to be used as a ThinLTO pass or a LTO pass. It
>>>> maps each module to a distinct integer, it also generate a mapping file so
>>>> we can decode the function symbol name from the pair of ids.
>>>>
>>>> clang has '-finstrument-function-entry-bare' which inserts a function
>>>> call and is not as efficient.
>>>>
>>>> Three patches are attached, for llvm, clang, and compiler-rt
>>>> respectively.
>>>>
>>>> TODO:
>>>> (1) Migrate to the new pass manager with a shim for the legacy pass
>>>> manager.
>>>> (2) For the order file buffer, consider always emitting definitions,
>>>> making them LinkOnceODR with a COMDAT group.
>>>> (3) Add testing case for clang/compiler-rt patches.
>>>> (4) Add utilities to deobfuscate the profile dump.
>>>> (5) The size of the buffer is currently hard-coded (
>>>> INSTR_ORDER_FILE_BUFFER_SIZE).
>>>>
>>>> Thanks Kamal for contributing to the patches! Thanks to Aditya and
>>>> Saleem for doing an initial review pass over the patches!
>>>>
>>>> Manman
>>>>
>>>>
>>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190117/c0e2ac9b/attachment.html>


More information about the llvm-dev mailing list