[PATCH] D104060: Machine IR Profile

Ellis Hoag via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jun 16 17:09:50 PDT 2021


ellis added a comment.

Section Layout
--------------

MIP's major feature is the ability to extract all metadata from the instrumented binary to reduce its size. This is possible by using two sections, `__llvm_mipraw` to store profile data and `__llvm_mipmap` to store function metadata. The map section is extracted from the binary before runtime, and the raw section is dumped at runtime. Both sections are then combined to a final .mip profile. Both sections have a 24 byte header and a box of data for each instrumented function.

`__llvm_mipraw`
---------------

The contents of this section depends on the type of instrumentation used, but the data is always "unreadable" without the map section. For function coverage, we simply allocate a single byte for each instrumented function and initialize it to all ones.

  _Z3foov$RAW:
    .byte  0xff



`__llvm_mipmap`
---------------

The map section allows us to map instrumented functions to their profile data in the raw section. The two most important values are the function name and the offset to the profile data in the raw section. With these values we can read a function's profile data using the offset.

  _Z3foov$MAP:
  .Lref:
    .long  __start___llvm_mipraw-.Lref     # Raw Section Start PC Offset
    .long  _Z3foov$RAW-.Lref               # Raw Profile Symbol PC Offset
    .long  _Z3foov-.Lref                   # Function PC Offset
    .long  [[FOO_END]]-_Z3foov             # Function Size
    .long  0x70c9fa27                      # CFG Signature
    .long  2                               # Non-entry Block Count
    .long  [[FOO_BLOCK0]]-_Z3foov          # Block 0 Offset
    .long  [[FOO_BLOCK1]]-_Z3foov          # Block 1 Offset
    .long  7                               # Function Name Length
    .ascii  "_Z3foov"

The main challenge is storing the offset to the profile data without using dynamic relocations. This is complicated by the fact that we use comdat sections within the `__llvm_mipraw` section and that ELF does not seem to directly support section relative addresses. The solution is to use PC relative relocations. `__start___llvm_mipraw-.Lref` gives us the PC relative offset to the start of the raw section and `_Z3foov$RAW-.Lref` gives us the offset to the profile data for this function relative to the same PC. After we extract the map section, we can subtract these to get the value we want, the section relative raw profile data offset.

  (_Z3foov$RAW-.Lref) - (__start___llvm_mipraw-.Lref) = _Z3foov$RAW - __start___llvm_mipraw

We can use the same trick to encode the function address, we just need to also add the address of the raw section which can be looked up in the binary. This is useful to lookup debug info and join it with our final profile data.

The other values are relatively straightforward. The function size and block offsets allow us to map block coverage data to debug info. The control flow graph signature is used to verify that the function has not changed since the profile was collected. And last we have the mangled function name to help identify the function.

Integration with Existing PGI?
------------------------------

There seems to be a question about why we chose to implement MIP from scratch instead of extending an existing framework. If we were to extend `-fprofile-instr-generate` we would need to have extractable metadata, which may be too invasive to implement.

- As shown above, a lot of work was done to make sure the metadata can be extracted correctly.
- Existing pgi has structured raw data that would need to be moved to the extractable metadata section.
- Our MIP tool has a `llvm-mipdata create` command which converts the map section to an “empty” profile that can later be merged with raw profiles. Existing pgi tools do not have this extra step.

MIP Edge Profile
----------------

We omitted the code that adds call edge profiles to reduce the complexity of the current review, but we do plan to upload it. For each instrumented function, we sample return address values so we can build a dynamic call graph from the profile. The format is largely the same, but we added a size-configurable static buffer to hold the return address values. This approach correctly profiles dynamic dispatch calls that are found in Objective-C and Swift as well as any normal function calls. We can, for example, identify candidate for `objc_direct` using this call edge data.

Optimization
------------

We also omitted the profile consumption and optimization code from the current review. Our focus for optimization is on outlining to reduce size and function ordering to reduce page faults. When we consume profile data with function coverage, block coverage, or function call counts we can make smarter inlining and outlining decisions based on hotness. When we have profile data with timestamps, call counts, or the dynamic call graph then we can generate an optimal function order that reduces the number of page faults.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104060/new/

https://reviews.llvm.org/D104060



More information about the llvm-commits mailing list