[llvm] [compiler-rt] [docs][IRPGO]Document two binary formats for IRPGO profiles (PR #76105)
Mingming Liu via llvm-commits
llvm-commits at lists.llvm.org
Tue Jan 2 13:03:16 PST 2024
================
@@ -0,0 +1,457 @@
+===================================
+Instrumentation Profile Format
+===================================
+
+.. contents::
+ :local:
+
+
+Overview
+=========
+
+Clang supports two types of profiling via instrumentation [1]_: frontend-based
+and IR-based, and both could support a variety of use cases [2]_ .
+This document describes two binary serialization formats (raw and indexed) to
+store instrumented profiles with a specific emphasis on IRPGO use case, in the
+sense that when specific header fields and payload sections have different ways
+of interpretation across use cases, the documentation is based on IRPGO.
+
+.. note::
+ Frontend-generated profiles are used together with coverage mapping for
+ `source based code coverage`_. The `coverage mapping format`_ is different from
+ profile format.
+
+.. _`source based code coverage`: https://clang.llvm.org/docs/SourceBasedCodeCoverage.html
+.. _`coverage mapping format`: https://llvm.org/docs/CoverageMappingFormat.html
+
+Raw Profile Format
+===================
+
+The raw profile is generated by running the instrumented binary. The raw profile
+data from an executable or a shared library [3]_ consists of a header and
+multiple sections, with each section as a memory dump. The profile raw data needs
+to be reasonably compact and fast to generate.
+
+There are no backward or forward version compatiblity guarantees for the raw profile
+format. That is, compilers and tools `require`_ a specific raw profile version
+to parse the profiles.
+
+.. _`require`: https://github.com/llvm/llvm-project/blob/bffdde8b8e5d9a76a47949cd0f574f3ce656e181/llvm/lib/ProfileData/InstrProfReader.cpp#L551-L558
+
+To feed profiles back into compilers for an optimized build (e.g., via
+`-fprofile-use` for IR instrumentation), a raw profile must to be converted into
+indexed format.
+
+General Storage Layout
+-----------------------
+
+The storage layout of raw profile data format is illustrated below. Basically,
+when the raw profile is read into an memory buffer, the actual byte offset of a
+section is inferred from the section's order in the layout and size information
+of all the sections ahead of it.
+
+::
+
+ +----+-----------------------+
+ | | Magic |
+ | +-----------------------+
+ | | Version |
+ | +-----------------------+
+ H | Size Info for |
+ E | Section 1 |
+ A +-----------------------+
+ D | Size Info for |
+ E | Section 2 |
+ R +-----------------------+
+ | | ... |
+ | +-----------------------+
+ | | Size Info for |
+ | | Section N |
+ +----+-----------------------+
+ P | Section 1 |
+ A +-----------------------+
+ Y | Section 2 |
+ L +-----------------------+
+ O | ... |
+ A +-----------------------+
+ D | Section N |
+ +----+-----------------------+
+
+
+.. note::
+ Sections might be padded to meet specific alignment requirements. For
+ simplicity, header fields and data sections solely for padding purpose are
+ omitted in the data layout graph above and the rest of this document.
+
+Header
+-------
+
+``Magic``
+ Magic number encodes profile format (raw, indexed or text). For the raw format,
+ the magic number also encodes the endianness (big or little) and C pointer
+ byte size (32 or 64) of the platform on which the profile is generated.
+
+ A factory method reads the magic number to construct reader properly and returns
+ error upon unrecognized format. Specifically, the factory method and raw profile
+ reader implementation make sure that a raw profile file could be read back on
+ a platform with the opposite endianness and/or the other C pointer byte size.
+
+``Version``
+ The lower 32 bits specifies the actual version and the most significant 32
+ bits specify the variant types of the profile. IR-based instrumentation PGO
+ and context-sensitive IR-based instrumentation PGO are two variant types.
+
+``BinaryIdsSize``
+ The byte size of binary id section.
+
+``NumData``
+ The number of profile metadata. The byte size of profile metadata section
+ could be computed with this field.
+
+``NumCounter``
+ The number of entries in the profile counter section. The byte size of counter
+ section could be computed with this field.
+
+``NumBitmapBytes``
+ The number of bytes in the profile bitmap section.
+
+``NamesSize``
+ The number of bytes in the name section.
+
+``CountersDelta``
+ In the IRPGO case [4]_, this field records the in-memory address difference
+ between the metadata and counter section in the instrumented binary,
+ i.e., `start(__llvm_prf_cnts) - start(__llvm_prf_data)`.
+
+ It's used jointly with the in-memory address difference of profile metadata
+ record and its counter in the instrumented binary to compute the counter offset
+ relative to `start(__llvm_prf_cnts)`. Check out calculation-of-counter-offset_
+ for a visualized explanation.
+
+``BitmapDelta``
+ In the IRPGO case [4]_, this field records the in-memory address difference
+ between the metadata and bitmap section in the instrumented binary,
+ i.e., `start(__llvm_prf_bits) - start(__llvm_prf_data)`.
+
+ It's used jointly with the in-memory address difference of a profile data record
+ and its bitmap in the instrumented binary to find the bitmap of a profile data
+ record, in a similar way to how counters are referenced as explained by
+ calculation-of-counter-offset_ .
+
+``NamesDelta``
+ Records the in-memory address of name section. Not used except for raw profile
+ reader error checking.
+
+``ValueKindLast``
+ Records the number of value kinds. Macro `VALUE_PROF_KIND`_ defines the value
+ kinds with a description of the kind.
+
+.. _`VALUE_PROF_KIND`: https://github.com/llvm/llvm-project/blob/7e405eb722e40c79b7726201d0f76b5dab34ba0f/compiler-rt/include/profile/InstrProfData.inc#L184-L186
+
+Payload Sections
+------------------
+
+Binary Ids
+^^^^^^^^^^^
+Stores the binary ids of the instrumented binaries to associate binaries with
+profiles for source code coverage. See `Binary Id RFC`_ for the design.
+
+.. _`Binary Id RFC`: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151154.html
+
+Profile Metadata
+^^^^^^^^^^^^^^^^^^
+
+This section stores the metadata to map counters and value profiles back to
+instrumented code regions (e.g., LLVM IR for IRPGO).
+
+The in-memory representation of the metadata is `__llvm_profile_data`_.
+Some fields are used to reference data from other sections in the profile.
+The fields are documented as follows:
+
+.. _`__llvm_profile_data`: https://github.com/llvm/llvm-project/blob/7c3b67d2038cfb48a80299089f6a1308eee1df7f/compiler-rt/lib/profile/InstrProfiling.h#L25
+
+``NameRef``
+ The MD5 of the function's PGO name. PGO name has the format
+ `[<filepath><delimiter>]<linkage-or-mangled-name>` where `<filepath>` and
+ `<delimiter>` is provided for local-linkage functions to tell possibly
+ identical functions.
+
+.. _FuncHash:
+
+``FuncHash``
+ A checksum of the function's IR, taking control flow graph and instrumented
+ value sites into accounts. See `computeCFGHash`_ for details.
+
+.. _`computeCFGHash`: https://github.com/llvm/llvm-project/blob/7c3b67d2038cfb48a80299089f6a1308eee1df7f/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp#L616-L685
+
+``CounterPtr``
+ The in-memory address difference between profile data and the start of corresponding
+ counters. Counter position is stored this way (as a link-time constant) to reduce
+ instrumented binary size compared with snapshotting the address of symbols directly.
+ See `commit a1532ed`_ for further information.
+
+.. _`commit a1532ed`: https://github.com/llvm/llvm-project/commit/a1532ed27582038e2d9588108ba0fe8237f01844
+
+ .. note::
+ `CounterPtr` might represent a different value for non-IRPGO use case. For
+ example, for `binary profile correlation`_, it represents the counter address.
+ When in doubt, check source code.
+
+``BitmapPtr``
----------------
minglotus-6 wrote:
I update the description of the `CounterPtr` and `BitmapPtr`. PTAL.
https://github.com/llvm/llvm-project/pull/76105
More information about the llvm-commits
mailing list