[llvm] [compiler-rt] [docs][IRPGO]Document two binary formats for IRPGO profiles (PR #76105)

Snehasish Kumar via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 9 09:38:19 PST 2024


================
@@ -0,0 +1,478 @@
+===================================
+Instrumentation Profile Format
+===================================
+
+.. contents::
+   :local:
+
+
+Overview
+=========
+
+Clang supports two types of profiling via instrumentation [1]_: frontend-based
+and IR-based, and both could support a variety of use cases [2]_ .
+This document describes two binary serialization formats (raw and indexed) to
+store instrumented profiles with a specific emphasis on IRPGO use case, in the
+sense that when specific header fields and payload sections have different ways
+of interpretation across use cases, the documentation is based on IRPGO.
+
+.. note::
+  Frontend-generated profiles are used together with coverage mapping for
+  `source based code coverage`_. The `coverage mapping format`_ is different from
+  profile format.
+
+.. _`source based code coverage`: https://clang.llvm.org/docs/SourceBasedCodeCoverage.html
+.. _`coverage mapping format`: https://llvm.org/docs/CoverageMappingFormat.html
+
+Raw Profile Format
+===================
+
+The raw profile is generated by running the instrumented binary. The raw profile
+data from an executable or a shared library [3]_ consists of a header and
+multiple sections, with each section as a memory dump. The profile raw data needs
+to be reasonably compact and fast to generate.
+
+There are no backward or forward version compatiblity guarantees for the raw profile
+format. That is, compilers and tools `require`_ a specific raw profile version
+to parse the profiles.
+
+.. _`require`: https://github.com/llvm/llvm-project/blob/bffdde8b8e5d9a76a47949cd0f574f3ce656e181/llvm/lib/ProfileData/InstrProfReader.cpp#L551-L558
+
+To feed profiles back into compilers for an optimized build (e.g., via
+`-fprofile-use` for IR instrumentation), a raw profile must to be converted into
+indexed format.
+
+General Storage Layout
+-----------------------
+
+The storage layout of raw profile data format is illustrated below. Basically,
+when the raw profile is read into an memory buffer, the actual byte offset of a
+section is inferred from the section's order in the layout and size information
+of all the sections ahead of it.
+
+::
+
+  +----+-----------------------+
+  |    |        Magic          |
+  |    +-----------------------+
+  |    |        Version        |
+  |    +-----------------------+
+  H    |   Size Info for       |
+  E    |      Section 1        |
+  A    +-----------------------+
+  D    |   Size Info for       |
+  E    |      Section 2        |
+  R    +-----------------------+
+  |    |          ...          |
+  |    +-----------------------+
+  |    |   Size Info for       |
+  |    |      Section N        |
+  +----+-----------------------+
+  P    |       Section 1       |
+  A    +-----------------------+
+  Y    |       Section 2       |
+  L    +-----------------------+
+  O    |          ...          |
+  A    +-----------------------+
+  D    |       Section N       |
+  +----+-----------------------+
+
+
+.. note::
+   Sections might be padded to meet specific alignment requirements. For
+   simplicity, header fields and data sections solely for padding purpose are
+   omitted in the data layout graph above and the rest of this document.
+
+Header
+-------
+
+``Magic``
+  Magic number encodes profile format (raw, indexed or text). For the raw format,
+  the magic number also encodes the endianness (big or little) and C pointer
+  byte size (32 or 64) of the platform on which the profile is generated.
+
+  A factory method reads the magic number to construct reader properly and returns
+  error upon unrecognized format. Specifically, the factory method and raw profile
+  reader implementation make sure that a raw profile file could be read back on
+  a platform with the opposite endianness and/or the other C pointer byte size.
+
+``Version``
+  The lower 32 bits specifies the actual version and the most significant 32
+  bits specify the variant types of the profile. IR-based instrumentation PGO
+  and context-sensitive IR-based instrumentation PGO are two variant types.
+
+``BinaryIdsSize``
+  The byte size of binary id section.
+
+``NumData``
+  The number of profile metadata. The byte size of `profile metadata`_ section
+  could be computed with this field.
+
+``NumCounter``
+  The number of entries in the profile counter section. The byte size of `counter`_
+  section could be computed with this field.
+
+``NumBitmapBytes``
+  The number of bytes in the profile `bitmap`_ section.
+
+``NamesSize``
+  The number of bytes in the name section.
+
+.. _`CountersDelta`:
+
+``CountersDelta``
+  This field records the in-memory address difference between the `profile metadata`_
+  and counter section in the instrumented binary, i.e., `start(__llvm_prf_cnts) - start(__llvm_prf_data)`.
+
+  It's used jointly with the `CounterPtr`_ field to compute the counter offset
+  relative to `start(__llvm_prf_cnts)`. Check out calculation-of-counter-offset_
+  for a visualized explanation.
+
+  .. note::
+    Instrumentations might not load the `__llvm_prf_data` object file section
+    in memory or does not generate the profile metadata section in raw profiles.
+    In those cases, `CountersDelta` is not used and other mechanism are used to
+    match counters with instrumented code. See `lightweight instrumentation`_ and
+    `binary profile correlation`_ for examples.
+
+``BitmapDelta``
+  This field records the in-memory address difference between the `profile metadata`_
+  and bitmap section in the instrumented binary, i.e., `start(__llvm_prf_bits) - start(__llvm_prf_data)`.
+
+  It's used jointly with the `BitmapPtr`_ to find the bitmap of a profile data
+  record, in a similar way to how counters are referenced as explained by
+  calculation-of-counter-offset_ .
+
+  Similar to `CountersDelta`_ field, this field may not be used in non-PGO variants
+  of profiles.
+
+``NamesDelta``
+  Records the in-memory address of name section. Not used except for raw profile
+  reader error checking.
+
+``ValueKindLast``
+  Records the number of value kinds. Macro `VALUE_PROF_KIND`_ defines the value
+  kinds with a description of the kind.
+
+.. _`VALUE_PROF_KIND`: https://github.com/llvm/llvm-project/blob/7e405eb722e40c79b7726201d0f76b5dab34ba0f/compiler-rt/include/profile/InstrProfData.inc#L184-L186
+
+Payload Sections
+------------------
+
+Binary Ids
+^^^^^^^^^^^
+Stores the binary ids of the instrumented binaries to associate binaries with
+profiles for source code coverage. See `Binary Id RFC`_ for the design.
+
+.. _`Binary Id RFC`: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151154.html
+
+.. _`profile metadata`:
+
+Profile Metadata
+^^^^^^^^^^^^^^^^^^
+
+This section stores the metadata to map counters and value profiles back to
+instrumented code regions (e.g., LLVM IR for IRPGO).
+
+The in-memory representation of the metadata is `__llvm_profile_data`_.
+Some fields are used to reference data from other sections in the profile.
+The fields are documented as follows:
+
+.. _`__llvm_profile_data`: https://github.com/llvm/llvm-project/blob/7c3b67d2038cfb48a80299089f6a1308eee1df7f/compiler-rt/include/profile/InstrProfData.inc#L65-L95
+
+``NameRef``
+  The MD5 of the function's PGO name. PGO name has the format
+  `[<filepath><delimiter>]<linkage-or-mangled-name>` where `<filepath>` and
+  `<delimiter>` is provided for local-linkage functions to tell possibly
+  identical functions.
+
+.. _FuncHash:
+
+``FuncHash``
+  A checksum of the function's IR, taking control flow graph and instrumented
+  value sites into accounts. See `computeCFGHash`_ for details.
+
+.. _`computeCFGHash`: https://github.com/llvm/llvm-project/blob/7c3b67d2038cfb48a80299089f6a1308eee1df7f/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp#L616-L685
+
+.. _`CounterPtr`:
+
+``CounterPtr``
+  The in-memory address difference between profile data and the start of corresponding
+  counters. Counter position is stored this way (as a link-time constant) to reduce
+  instrumented binary size compared with snapshotting the address of symbols directly.
+  See `commit a1532ed`_ for further information.
+
+.. _`commit a1532ed`: https://github.com/llvm/llvm-project/commit/a1532ed27582038e2d9588108ba0fe8237f01844
+
+  .. note::
+    `CounterPtr` might represent a different value for non-IRPGO use case. For
+    example, for `binary profile correlation`_, it represents the absolute address of counter.
+    When in doubt, check source code.
+
+.. _`BitmapPtr`:
+
+``BitmapPtr``
+  The in-memory address difference between profile data and the start address of
+  corresponding bitmap.
+
+  .. note::
+    Similar to `CounterPtr`_, this field may represent a different value for non-IRPGO use case.
+
+``FunctionPointer``
+  Records the function address when instrumented binary runs. This is used to
+  map the profiled callee address of indirect calls to the `NameRef` during
+  conversion from raw to indexed profiles.
+
+``Values``
+  Represents value profiles in a two dimensional array. The number of elements
+  in the first dimension is the number of instrumented value sites across all
+  kinds. Each element in the first dimension is the head of a linked list, and
+  the each element in the second dimension is linked list element, carrying
+  `<profiled-value, count>` as payload. This is used by compiler runtime when
+  writing out value profiles.
+
+  .. note::
+    Value profiling is supported by frontend and IR PGO instrumentation,
+    but it's not supported in all cases (e.g., `lightweight instrumentation`_).
+
+``NumCounters``
+  The number of counters for the instrumented function.
+
+``NumValueSites``
+  This is an array of counters, and each counter represents the number of
+  instrumented sites for a kind of value in the function.
+
+``NumBitmapBytes``
+  The number of bitmap bytes for the function.
+
+.. _`counter`:
+
+Profile Counters
+^^^^^^^^^^^^^^^^^
+
+For PGO [4]_, the counters within an instrumented function of a specific `FuncHash`_
+are stored contiguously and in an order that is consistent with instrumentation points selection.
+
+.. _calculation-of-counter-offset:
+
+As mentioned above, the recorded counter offset is relative to the profile metadata.
+So how are function counters located in the raw profile data?
+
+Basically, the profile reader iterates profile metadata (from the `profile metadata`_
+section) and makes use of the recorded relative distances, as illustrated below.
+
+::
+
+        + --> start(__llvm_prf_data) --> +---------------------+ ------------+
+        |                                |       Data 1        |             |
+        |                                +---------------------+  =====||    |
+        |                                |       Data 2        |       ||    |
+        |                                +---------------------+       ||    |
+        |                                |        ...          |       ||    |
+ Counter|                                +---------------------+       ||    |
+  Delta |                                |       Data N        |       ||    |
+        |                                +---------------------+       ||    |   CounterPtr1
+        |                                                              ||    |
+        |                                              CounterPtr2     ||    |
+        |                                                              ||    |
+        |                                                              ||    |
+        + --> start(__llvm_prf_cnts) --> +---------------------+       ||    |
+                                         |        ...          |       ||    |
+                                         +---------------------+  -----||----+
+                                         |    Counter for      |       ||
+                                         |       Data 1        |       ||
+                                         +---------------------+       ||
+                                         |        ...          |       ||
+                                         +---------------------+  =====||
+                                         |    Counter for      |
+                                         |       Data 2        |
+                                         +---------------------+
+                                         |        ...          |
+                                         +---------------------+
+                                         |    Counter for      |
+                                         |       Data N        |
+                                         +---------------------+
+
+
+In the graph,
+
+* The profile header records `CounterDelta` with the value as `start(__llvm_prf_cnts) - start(__llvm_prf_data)`.
+  We will call it `CounterDeltaInitVal` below for convenience.
+* For each profile data record `ProileDataN`, `CounterPtr` is recorded as `start(CounterN) - start(ProfileDataN)`,
+  where `ProfileDataN` is the N-th entry in `__llvm_prf_data`, and `CounterN` is
+  the corresponding profile counters.
+
+Each time the reader advances to the next data record, it `updates`_ `CounterDelta`
+to minus the size of one `ProfileData`.
+
+.. _`updates`: https://github.com/llvm/llvm-project/blob/17ff25a58ee4f29816d932fdb75f0d305718069f/llvm/include/llvm/ProfileData/InstrProfReader.h#L439-L444
+
+For the counter corresponding to the first data record, the byte offset
+relative to the start of the counter section is calculated as `CounterPtr1 - CounterDeltaInitVal`.
+When profile reader advances to the second data record, note `CounterDelta`
+is updated to `CounterDeltaInitVal - sizeof(ProfileData)`.
+Thus the byte offset relative to the start of the counter section is calculated
+as `CounterPtr2 - (CounterDeltaInitVal - sizeof(ProfileData))`.
+
+.. _`bitmap`:
+
+Bitmap
+^^^^^^^
+This section is used for source-based `Modified Condition/Decision Coverage`_ code coverage. Check out `Bitmap RFC`_
+for the design.
+
+.. _`Modified Condition/Decision Coverage`: https://en.wikipedia.org/wiki/Modified_condition/decision_coverage
+.. _`Bitmap RFC`: https://discourse.llvm.org/t/rfc-source-based-mc-dc-code-coverage/59244
+
+Names
+^^^^^^
+
+This section contains possibly compressed concatenated string of functions' PGO
+names. If compressed, zlib library is used.
+
+Function names serve as keys in the PGO data hash table when raw profiles are
+converted into indexed profiles. They are also crucial for `llvm-profdata` to
+show the profiles in a human-readable way.
+
+Value Profile Data
+^^^^^^^^^^^^^^^^^^^^
+
+This section contains the profile data for value profiling.
+
+The value profiles corresponding to a profile metadata are serialized contiguously
+as one record, and value profile records are stored in the same order as the
+respective profile data, such that a raw profile reader `advances`_ the pointer to
+profile data and the pointer to value profile records simutaneously [5]_ to find
+value profiles for a per function, per `FuncHash`_ profile data.
+
+.. _`advances`: https://github.com/llvm/llvm-project/blob/7e15fa9161eda7497a5d6abf0d951a1d12d86550/llvm/include/llvm/ProfileData/InstrProfReader.h#L456-L457
+
+Indexed PGO Profile Format
+===========================
+
+Indexed profiles are generated from `llvm-profdata`. In the indexed profiles,
+function PGO data are organized as on-disk hash table such that compilers could
+look up PGO data for functions in an IR module.
+
+Compilers and tools must retain backward compatibility with indexed PGO profiles.
----------------
snehasish wrote:

Spell it out? i.e. "older profiles must be readable by newer tools" or something like that... 

https://github.com/llvm/llvm-project/pull/76105


More information about the llvm-commits mailing list