[PATCH] D104060: Machine IR Profile

Wed Jun 16 19:41:38 PDT 2021

On Wed, Jun 16, 2021 at 5:09 PM Ellis Hoag via Phabricator <
reviews at reviews.llvm.org> wrote:

> ellis added a comment.
>
> Section Layout
> --------------
>
> MIP's major feature is the ability to extract all metadata from the
> instrumented binary to reduce its size. This is possible by using two
> sections, `__llvm_mipraw` to store profile data and `__llvm_mipmap` to
> store function metadata. The map section is extracted from the binary
> before runtime, and the raw section is dumped at runtime. Both sections are
> then combined to a final .mip profile. Both sections have a 24 byte header
> and a box of data for each instrumented function.
>
> `__llvm_mipraw`
> ---------------
>
> The contents of this section depends on the type of instrumentation used,
> but the data is always "unreadable" without the map section. For function
> coverage, we simply allocate a single byte for each instrumented function
> and initialize it to all ones.
>
>   _Z3foov$RAW:
>     .byte  0xff
>
>
>
> `__llvm_mipmap`
> ---------------
>
> The map section allows us to map instrumented functions to their profile
> data in the raw section. The two most important values are the function
> name and the offset to the profile data in the raw section. With these
> values we can read a function's profile data using the offset.
>
>   _Z3foov$MAP:
>   .Lref:
>     .long  __start___llvm_mipraw-.Lref     # Raw Section Start PC Offset
>     .long  _Z3foov$RAW-.Lref               # Raw Profile Symbol PC Offset
>     .long  _Z3foov-.Lref                   # Function PC Offset
>     .long  [[FOO_END]]-_Z3foov             # Function Size
>     .long  0x70c9fa27                      # CFG Signature
>     .long  2                               # Non-entry Block Count
>     .long  [[FOO_BLOCK0]]-_Z3foov          # Block 0 Offset
>     .long  [[FOO_BLOCK1]]-_Z3foov          # Block 1 Offset
>     .long  7                               # Function Name Length
>     .ascii  "_Z3foov"
>
> The main challenge is storing the offset to the profile data without using
> dynamic relocations. This is complicated by the fact that we use comdat
> sections within the `__llvm_mipraw` section and that ELF does not seem to
> directly support section relative addresses. The solution is to use PC
> relative relocations. `__start___llvm_mipraw-.Lref` gives us the PC
> relative offset to the start of the raw section and `_Z3foov$RAW-.Lref`
> gives us the offset to the profile data for this function relative to the
> same PC. After we extract the map section, we can subtract these to get the
> value we want, the section relative raw profile data offset.
>
>   (_Z3foov$RAW-.Lref) - (__start___llvm_mipraw-.Lref) = _Z3foov$RAW -
> __start___llvm_mipraw
>
> We can use the same trick to encode the function address, we just need to
> also add the address of the raw section which can be looked up in the
> binary. This is useful to lookup debug info and join it with our final
> profile data.
>
> The other values are relatively straightforward. The function size and
> block offsets allow us to map block coverage data to debug info. The
> control flow graph signature is used to verify that the function has not
> changed since the profile was collected. And last we have the mangled
> function name to help identify the function.
>
> Integration with Existing PGI?
> ------------------------------
>
> There seems to be a question about why we chose to implement MIP from
> scratch instead of extending an existing framework. If we were to extend
> `-fprofile-instr-generate` we would need to have extractable metadata,
> which may be too invasive to implement.
>

I believe you mean -fprofile-generate or -fcs-profile-generate.
-fprofile-instr-generate is based on front end AST and eventually will be
hidden under -fcoverage-test for source coveraging purpose only.

As you can see we are currently making an effort to unify/simplify the PGO
implementation. Having yet another instrumentation mechanism is doing the
opposite.

> - As shown above, a lot of work was done to make sure the metadata can be
> extracted correctly.
> - Existing pgi has structured raw data that would need to be moved to the
> extractable metadata section.
> - Our MIP tool has a `llvm-mipdata create` command which converts the map
> section to an “empty” profile that can later be merged with raw profiles.
> Existing pgi tools do not have this extra step.
>
>
As I said, size improvement efforts (under options) are welcome for the
existing IRPGO.  Another benefit is that we can have consolidated effort on
improving related toolings.

MIP Edge Profile
> ----------------
>
> We omitted the code that adds call edge profiles to reduce the complexity
> of the current review, but we do plan to upload it. For each instrumented
> function, we sample return address values so we can build a dynamic call
> graph from the profile. The format is largely the same, but we added a
> size-configurable static buffer to hold the return address values. This
> approach correctly profiles dynamic dispatch calls that are found in
> Objective-C and Swift as well as any normal function calls. We can, for
> example, identify candidate for `objc_direct` using this call edge data.
>
>
Adding this duplicate functionality (edge profiling) makes it even less
compelling to do it MIR level.

For the dynamic dispatching profiling, does it handle any number of
indirect targets or only supports topN?

Optimization
> ------------
>
> We also omitted the profile consumption and optimization code from the
> current review. Our focus for optimization is on outlining to reduce size
> and function ordering to reduce page faults. When we consume profile data
> with function coverage, block coverage, or function call counts we can make
> smarter inlining and outlining decisions based on hotness. When we have
> profile data with timestamps, call counts, or the dynamic call graph then
> we can generate an optimal function order that reduces the number of page
> faults.
>
> See above, adding any missing features in the existing framework is a more
prefered approach. I am yet to see convincing arguments that spinning off a
new instrumentation framework is the way to go.

David

> Repository:
>   rG LLVM Github Monorepo
>
> CHANGES SINCE LAST ACTION
>   https://reviews.llvm.org/D104060/new/
>
> https://reviews.llvm.org/D104060
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210616/1e067c0d/attachment.html>