[llvm-dev] [RFC] Order File Instrumentation

Chris Bieneman via llvm-dev llvm-dev at lists.llvm.org
Fri Jan 18 14:01:03 PST 2019

I would love to see this kind of order profiling support. Using dtrace to generate function orders is actually really problematic because dtrace made tradeoffs in implementation allowing it to ignore probe execution if the performance impact is too great on the system. This can result in dtrace being non-deterministic which is not ideal for generating optimization data.

Additionally if order generation could be enabled at the same time as PGO generation that would be a great solution for generating profile data for optimizing clang itself. Clang has some scripts and build-system goop under utils/perf-training that can generate order files using dtrace and PGO data, it would be great to apply this technique to those tools too.


> On Jan 18, 2019, at 2:43 AM, Hans Wennborg via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> On Thu, Jan 17, 2019 at 7:24 PM Manman Ren via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> Order file is used to teach ld64 how to order the functions in a binary. If we put all functions executed during startup together in the right order, we will greatly reduce the page faults during startup.
>> To generate order file for iOS apps, we usually use dtrace, but some apps have various startup scenarios that we want to capture in the order file. dtrace approach is not easy to automate, it is hard to capture the different ways of starting an app without automation. Instrumented builds however can be deployed to phones and profile data can be automatically collected.
>> For the Facebook app, by looking at the startup distribution, we are expecting a big win out of the order file instrumentation, from 100ms to 500ms+, in startup time.
>> The basic idea of the pass is to use a circular buffer to log the execution ordering of the functions. We only log the function when it is first executed. Instead of logging the symbol name of the function, we log a pair of integers, with one integer specifying the module id, and the other specifying the function id within the module.
> [...]
>> clang has '-finstrument-function-entry-bare' which inserts a function call and is not as efficient.
> Can you elaborate on why this existing functionality is not efficient
> enough for you?
> For Chrome on Windows, we use -finstrument-functions-after-inlining to
> insert calls at function entry (after inlining) that calls a function
> which captures the addresses in a buffer, and later symbolizes and
> dumps them to an order file that we feed the linker. We use a similar
> approach on for Chrome on Android, but I'm not as familiar with the
> details there.
> Thanks,
> Hans
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

More information about the llvm-dev mailing list