[llvm-dev] [RFC] Machine IR Profile (MIP)

Kyungwoo Lee via llvm-dev llvm-dev at lists.llvm.org
Fri Jun 11 09:20:14 PDT 2021

Hi all,

Ellis Hoag and I propose our work, Machine IR profile which is lightweight
and scalable.
Please see the proposal below and the initial patch,
It would be great to hear the feedback!

Machine IR Profile (MIP)tl;dr;

This is a proposal to introduce a new instrumentation pass that can produce
optimization profiles with a focus on binary size and runtime performance
of the instrumented binaries.

Our instrumented binaries record machine function call counts, machine
function timestamps, machine basic block coverage, and a subset of the
dynamic call graph. There is also a more lightweight mode that only
collects machine function coverage data that has negligible runtime
overhead and a binary size increase of 2-5% for instrumented binaries.

This is just the first patch of the WIP MIP project. The full branch can be
found at https://github.com/ellishg/llvm-project

In the mobile space, increasing binary size has an outsized impact on both
runtime performance and download speed. Current instrumentation
implementations such as XRay and GCov produce binaries that are too slow
and too large to run on real mobile devices. We propose a new pass that
injects instrumentation code at the machine ir level. At runtime, we write
profile data to our custom __llvm_mipraw section that is eventually dumped
to a .mipraw file. At buildtime, we emit a .mipmap file which we use to map
function information to data in the .mipraw file. The result is that no
redundant function info is stored in the binary, which allows our
instrumentation to have minimal size overhead.

MIP has been implemented on ELF and Mach-O targets for x86_64, AArch64, and
Armv7 with Thumb and Thumb2.

Our focus for now is on the performance and size of binaries that have
injected instrumentation instead of binaries that have been optimized with
instrumentation profiles. We collected some basic results from
MultiSource/Benchmarks in llvm-test-suite for both MIP and clang’s
instrumentation using the -fprofile-generate flag. It should be noted that
this comparison is not fair because clang’s instrumentation collects much
more data than just function coverage. However, we expect fully-featured
MIP to have similar metrics.
Instrumented Binary Size

At the moment, we have implemented function coverage which injects one
x86_64 instruction (7 bytes) and one byte of global data for each
instrumented function, which should have minimal impact on binary size and
runtime performance. In fact, our results show that we should expect MIP
instrumented binaries to be only 2-5% larger. We contrast this with clang’s
instrumentation, which can increase the binary size by 500-900%.
Instrumented Execution Time

We found that MIP had negligable execution time regressions when
instrumented with MIP. Again, we can (unfairly) contrast this to
-fprofile-generate which increased execution time by 1-40%.

We use the -fmachine-profile-generate clang flag to produce an instrumented
binary and then use llvm-objcopy to extract the .mipmap file.

$ clang -g -fmachine-profile-generate main.cpp$ llvm-objcopy
--dump-section=__llvm_mipmap=default.mipmap a.out /dev/null$
llvm-strip -g a.out -o a.out.stripped

This will produce the instrumented binary a.out and a map file

When we run the binary, it will produce a default.mipraw file containing
the profile data for that run.

$ ./a.out.stripped$ lsa.out    default.mipmap    default.mipraw    main.cpp

Then we use our custom tool to postprocess the raw profile and produce the
final profile default.mip.

$ llvm-mipdata create -p default.mip default.mipmap$ llvm-mipdata
merge -p default.mip default.mipraw

If our binary has debug info, we can use it to report source information
along with the profile data.

$ llvm-mipdata show -p default.mip --debug a.out_Z3fooi  Source Info:
/home/main.cpp:9  Call Count: 0  Block Coverage:     COLD COLD COLD
COLD COLD_Z3bari  Source Info: /home/main.cpp:16  Call Count: 1  Block
Coverage:     HOT  HOT  COLD HOT  HOT

Finally, we can consume the profile using the clang flag
-fmachine-profile-use= to produce a profile-optimized binary.

$ clang -fmachine-profile-use=default.mip main.cpp
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210611/4bda1a5e/attachment.html>

More information about the llvm-dev mailing list