[compiler-rt] [llvm] [docs][IRPGO]Document two binary formats for IRPGO profiles (PR #76105)

Mingming Liu via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 2 13:02:15 PST 2024


https://github.com/minglotus-6 updated https://github.com/llvm/llvm-project/pull/76105

>From e3bfecf57ff7440ed50b51114df0d726c0e16b61 Mon Sep 17 00:00:00 2001
From: mingmingl <mingmingl at google.com>
Date: Tue, 19 Dec 2023 20:57:56 -0800
Subject: [PATCH 1/9] [compiler-rt][test]Mark thinlto icp test as UNSUPPORTED.
 Test failed when building instrumented binary

---
 .../profile/instrprof-thinlto-indirect-call-promotion.cpp    | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp b/compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp
index 08efa42167e94d..adba5b24253a42 100644
--- a/compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp
+++ b/compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp
@@ -13,8 +13,13 @@
 // - Generate ThinLTO summary file with LLVM bitcodes, and run `function-import` pass.
 // - Run `pgo-icall-prom` pass for the IR module which needs to import callees.
 
+
 // REQUIRES: windows || linux || darwin
 
+// The test failed on ppc when building the instrumented binary.
+// ld.lld: error: /lib/../lib64/Scrt1.o: ABI version 1 is not supported
+// UNSUPPORTED: ppc
+
 // This test and IR test llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll
 // are complementary to each other; a compiler-rt test has better test coverage
 // on different platforms, and the IR test is less restrictive in terms of

>From 1b3a6b9d4e21e7355e086d2527b1a33835cba551 Mon Sep 17 00:00:00 2001
From: mingmingl <mingmingl at google.com>
Date: Tue, 19 Dec 2023 20:58:51 -0800
Subject: [PATCH 2/9] Revert "[compiler-rt][test]Mark thinlto icp test as
 UNSUPPORTED. Test failed when building instrumented binary"

This reverts commit e3bfecf57ff7440ed50b51114df0d726c0e16b61.
---
 .../profile/instrprof-thinlto-indirect-call-promotion.cpp    | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp b/compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp
index adba5b24253a42..08efa42167e94d 100644
--- a/compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp
+++ b/compiler-rt/test/profile/instrprof-thinlto-indirect-call-promotion.cpp
@@ -13,13 +13,8 @@
 // - Generate ThinLTO summary file with LLVM bitcodes, and run `function-import` pass.
 // - Run `pgo-icall-prom` pass for the IR module which needs to import callees.
 
-
 // REQUIRES: windows || linux || darwin
 
-// The test failed on ppc when building the instrumented binary.
-// ld.lld: error: /lib/../lib64/Scrt1.o: ABI version 1 is not supported
-// UNSUPPORTED: ppc
-
 // This test and IR test llvm/test/Transforms/PGOProfile/thinlto_indirect_call_promotion.ll
 // are complementary to each other; a compiler-rt test has better test coverage
 // on different platforms, and the IR test is less restrictive in terms of

>From ad7fe43817a0aa77f5d56e9c0813f81f5f0b18c4 Mon Sep 17 00:00:00 2001
From: mingmingl <mingmingl at google.com>
Date: Wed, 20 Dec 2023 14:17:12 -0800
Subject: [PATCH 3/9] [docs][IRPGO]Document two binary formats for IRPGO
 profiles

---
 llvm/docs/PGOProfileFormat.rst | 387 +++++++++++++++++++++++++++++++++
 llvm/docs/UserGuides.rst       |   4 +
 2 files changed, 391 insertions(+)
 create mode 100644 llvm/docs/PGOProfileFormat.rst

diff --git a/llvm/docs/PGOProfileFormat.rst b/llvm/docs/PGOProfileFormat.rst
new file mode 100644
index 00000000000000..5602172e147f00
--- /dev/null
+++ b/llvm/docs/PGOProfileFormat.rst
@@ -0,0 +1,387 @@
+=====================
+IRPGO Profile Format
+=====================
+
+.. contents::
+   :local:
+
+
+Overview
+==========
+
+IR-based instrumentation (IRPGO) and its context-sensitive variant (CS-IRPGO)
+inserts `llvm.instrprof.*` `code generator intrinsics <https://llvm.org/docs/LangRef.html#code-generator-intrinsics>`_
+in LLVM IR to generate profiles. This document describes two binary profile
+formats (raw and indexed) used by IR-based instrumentation.
+
+.. note::
+
+  Both the compiler-rt profiling infrastructure and profile format are general
+  and could support other use cases (e.g., coverage and temporal profiling).
+  This document will focus on IRPGO while briefly introducing other use cases
+  with pointers.
+
+Raw PGO Profile Format
+========================
+
+The raw PGO profile is generated by running the instrumented binary. It is a
+memory dump of the profile data.
+
+Two kinds of frequently used profile information are function's basic block
+counters and its (various flavors of) value profiles. A function's profiled
+information span across several sections in the profile.
+
+General Storage Layout
+-----------------------
+
+A raw profile for an executable [1]_ consists of a profile header and several
+sections. The storage layout is illustrated below. Generally, when raw profile
+is read into an memory buffer, the actual byte offset of a section is inferred
+from the section's order in the layout and size information of all sections
+ahead of it.
+
+::
+
+  +----+-----------------------+
+  |    |        Magic          |
+  |    +-----------------------+
+  |    |        Version        |
+  |    +-----------------------+
+  H    |   Size Info for       |
+  E    |      Section 1        |
+  A    +-----------------------+
+  D    |   Size Info for       |
+  E    |      Section 2        |
+  R    +-----------------------+
+  |    |          ...          |
+  |    +-----------------------+
+  |    |   Size Info for       |
+  |    |      Section N        |
+  +----+-----------------------+
+  P    |       Section 1       |
+  A    +-----------------------+
+  Y    |       Section 2       |
+  L    +-----------------------+
+  O    |          ...          |
+  A    +-----------------------+
+  D    |       Section N       |
+  +----+-----------------------+
+
+
+.. note::
+   Sections might be padded to meet platform-specific alignment requirements.
+   For simplicity, header fields and data sections solely for padding purpose
+   are omitted in the data layout graph above and the rest of this document.
+
+Header
+-------
+
+``Magic``
+  With the magic number, data consumer could detect profile format and
+  endianness of the data, and quickly tells whether/how to continue reading.
+
+``Version``
+  The lower 32 bits specifies the actual version and the most significant 32
+  bits specify the variant types of the profile. IRPGO and CS-IRPGO are two
+  variant types.
+
+``BinaryIdsSize``
+  The byte size of binary id section.
+
+``NumData``
+  The number of per-function profile data control structures. The byte size of
+  profile data section could be computed with this field.
+
+``NumCounter``
+  The number of entries in the profile counter section. The byte size of counter
+  section could be computed with this field.
+
+``NumBitmapBytes``
+  The number of bytes in the profile bitmap section.
+
+``NamesSize``
+  The number of bytes in the name section.
+
+``CountersDelta``
+  Records the in-memory address difference between the data and counter section,
+  i.e., `start(__llvm_prf_cnts) - start(__llvm_prf_data)`. It's used jointly
+  with the in-memory address difference of profile data record and its counter
+  to find the counter of a profile data record. Check out calculation-of-counter-offset_
+  for details.
+
+``BitmapDelta``
+  Records the in-memory address difference between the data and bitmap section,
+  i.e., `start(__llvm_prf_bits) - start(__llvm_prf_data)`. It's used jointly
+  with the in-memory address difference of a profile data record and its bitmap
+  to find the bitmap of a profile data record, in a similar to how counters are
+  referenced as explained by calculation-of-counter-offset_ .
+
+``NamesDelta``
+  Records the in-memory address of compressed name section. Not used except for
+  raw profile reader error checking.
+
+``ValueKindLast``
+  Records the number of value kinds. As of writing, two kinds of value profiles
+  are supported. `IndirectCallTarget` is to profile the frequent callees of
+  indirect call instructions and `MemOPSize` is for memory intrinsic function
+  size profiling.
+
+  The number of value kinds affects the byte size of per function profile data
+  control structure.
+
+Payload Sections
+------------------
+
+Binary Ids
+^^^^^^^^^^^
+Stores the binary ids of the instrumented binaries to associate binaries with
+profiles for source code coverage. See `Binary Id RFC`_ for introduction.
+
+.. _`Binary Id RFC`: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151154.html
+
+Profile Data
+^^^^^^^^^^^^^
+
+This section stores per-function profile data control structure. The in-memory
+representation of the control structure is `__llvm_profile_data` and the fields
+are defined by `INSTRPROFDATA` macro. Some fields are used to reference data
+from other sections in the profile. The fields are documented as follows:
+
+``NameRef``
+  The MD5 of the function's IRPGO name. IRPGO name has the format
+  `[<filepath>;]<linkage-name>` where `<filepath>;` is provided for local-linkage
+  functions to tell possibly identical function names.
+
+``FuncHash``
+  A fingerprint of the function's control flow graph.
+
+``CounterPtr``
+  The in-memory address difference between profile data and its corresponding counters.
+
+``BitmapPtr``
+  The in-memory address difference between profile data and its bitmap.
+
+``FunctionPointer``
+  Records the function address when instrumented binary runs. This is used to
+  map the profiled callee address of indirect calls to the `NameRef` during
+  conversion from raw to indexed profiles.
+
+``Values``
+  Represents value profiles in a two dimensional array. The number of elements
+  in the first dimension is the number of instrumented value sites across all
+  kinds. Each element in the first dimension is the head of a linked list, and
+  the each element in the second dimension is linked list element, carrying
+  `<profiled-value, count>` as payload. This is used by compiler runtime when
+  writing out value profiles.
+
+``NumCounters``
+  The number of counters for the instrumented function.
+
+``NumValueSites``
+  This is an array of counters, and each counter represents the number of
+  instrumented sites for a kind of value in the function.
+
+``NumBitmapBytes``
+  The number of bitmap bytes for the function.
+
+Profile Counters
+^^^^^^^^^^^^^^^^^
+
+For IRPGO [2]_, the counters within an instrumented function are stored contiguously
+and in an order that is consistent with basic block selection in the instrumentation
+pass.
+
+.. _calculation-of-counter-offset:
+
+So how are function counters associated with a function?
+
+Basically, the profile reader iterates per-function control structure (from the
+profile data section) and makes use of the recorded relative distances, as
+illustrated below.
+
+::
+
+        + --> start(__llvm_prf_data) --> +---------------------+ ------------+
+        |                                |       Data 1        |             |
+        |                                +---------------------+  =====||    |
+        |                                |       Data 2        |       ||    |
+        |                                +---------------------+       ||    |
+        |                                |        ...          |       ||    |
+ Counter|                                +---------------------+       ||    |
+  Delta |                                |       Data N        |       ||    |
+        |                                +---------------------+       ||    |   CounterPtr1
+        |                                                              ||    |
+        |                                              CounterPtr2     ||    |
+        |                                                              ||    |
+        |                                                              ||    |
+        + --> start(__llvm_prf_cnts) --> +---------------------+       ||    |
+                                         |        ...          |       ||    |
+                                         +---------------------+  -----||----+
+                                         |      Counter 1      |       ||
+                                         +---------------------+       ||
+                                         |        ...          |       ||
+                                         +---------------------+  =====||
+                                         |      Counter 2      |
+                                         +---------------------+
+                                         |        ...          |
+                                         +---------------------+
+                                         |      Counter N      |
+                                         +---------------------+
+
+
+In the graph,
+
+* The profile header records `CounterDelta` with the value as `start(__llvm_prf_cnts) - start(__llvm_prf_data)`.
+  We will call it `CounterDeltaInitVal` below for convenience.
+* For each profile data record, `CounterPtrN` is recorded as `start(Counter) - start(ProfileData)`.
+
+Each time the reader advances to the next data record, it updates `CounterDelta` to minus the size of one `ProfileData`.
+
+For the counter corresponding to the first data record, the byte offset
+relative to the start of the counter section is calculated as `CounterPtr1 - CounterDeltaInitVal`.
+When profile reader advances to the second data record, note `CounterDelta` is now `CounterDeltaInitVal - sizeof(ProfileData)`.
+Thus the byte offset relative to the start of the counter section is calculated as `CounterPtr2 - (CounterDeltaInitVal - sizeof(ProfileData))`.
+
+Bitmap
+^^^^^^^
+This section is used for source-based MC/DC code coverage. Check out `Bitmap RFC`_
+if interested.
+
+.. _`Bitmap RFC`: https://discourse.llvm.org/t/rfc-source-based-mc-dc-code-coverage/59244
+
+Names
+^^^^^^
+
+This section contains the concatenated string of function IRPGO names. If
+compressed, zlib compression algorithm is used.
+
+Function names serve as keys in the PGO data hash table when raw profiles are
+converted into indexed profiles. They are also crucial for `llvm-profdata` to
+show the profiles in a human-readable way.
+
+Value Profile Data
+^^^^^^^^^^^^^^^^^^^^
+
+This section contains the profile data for value profiling.
+
+The value profiles corresponding to a profile data are serialized contiguously
+as one record, and value profile records are stored in the same order as the
+respective profile data, such that a raw profile reader advances the pointer to
+profile data and the pointer to value profile records simutaneously [3]_ to find
+value profiles for a per function, per cfg fingerprint profile data.
+
+Indexed PGO Profile Format
+===========================
+
+General Storage Layout
+-----------------------
+
+::
+
+                            +-----------------------+---+
+                            |        Magic          |   |
+                            +-----------------------+   |
+                            |        Version        |   |
+                            +-----------------------+   |
+                            |        HashType       |   H
+                            +-----------------------+   E
+                    +-------|       HashOffset      |   A
+                    |       +-----------------------+   D
+                +-----------|     MemProfOffset     |   E
+                |   |       +-----------------------+   R
+                |   |       |     BinaryIdOffset    |   |
+                |   |       +-----------------------+   |
+            +---------------|      TemporalProf-    |   |
+            |   |   |       |      TracesOffset     |   |
+            |   |   |       +-----------------------+---+
+            |   |   |       |   Profile Summary     |   |
+            |   |   |       +-----------------------+   P
+            |   |   +------>|  Function PGO data    |   A
+            |   |           +-----------------------+   Y
+            |   +---------- |  MemProf profile data |   L
+            |               +-----------------------+   O
+            |               |    Binary Ids         |   A
+            |               +-----------------------+   D
+            +-------------->|  Temporal profiles    |   |
+                            +-----------------------+---+
+
+Header
+--------
+
+``Magic``
+  The purpose of the magic number is to be able to quickly tell if the profile
+  is an indexed profile.
+
+``Version``
+  Similar to raw profile version, the lower 32 bits specifies the version of the
+  indexed profile and the most significant 32 bits are reserved to specify the
+  variant types of the profile.
+
+``HashType``
+  The hashing scheme for on-disk hash table keys. Only MD5 hashing is used as of
+  writing.
+
+``HashOffset``
+  An on-disk hash table stores the per-function profile records.
+  Precisely speaking, `HashOffset` records the offset of this hash table's
+  metadata (i.e., the number of buckets and entries), which follows right after
+  the payload of the entire hash table.
+
+``MemProfOffset``
+  Records the byte offset of MemProf profiling data.
+
+``BinaryIdOffset``
+  Records the byte offset of binary id sections.
+
+``TemporalProfTracesOffset``
+  Records the byte offset of temporal profiles.
+
+Payload Sections
+------------------
+
+(CS) Profile Summary
+^^^^^^^^^^^^^^^^^^^^^
+This section is right after profile header. It stores the serialized profile
+summary. For context-sensitive IRPGO, this section stores an additional profile
+summary corresponding to the context-sensitive profiles.
+
+Function PGO data
+^^^^^^^^^^^^^^^^^^
+This section stores functions and their PGO profiling data as an on-disk hash
+table. The key of a hash table entry is function's PGO name, and the in-memory
+representation of value is a map. The key of this map is CFG hash, and the value
+is C++ struct `llvm::InstrProfRecord`. The C++ struct collects the profiling
+information like counters and value profiles.
+
+MemProf Profile data
+^^^^^^^^^^^^^^^^^^^^^^
+This section stores function's memory profiling data. See
+`MemProf binary serialization format RFC`_ for the design.
+
+.. _`MemProf binary serialization format RFC`: https://lists.llvm.org/pipermail/llvm-dev/2021-September/153007.html
+
+Binary Ids
+^^^^^^^^^^^^^^^^^^^^^^
+The section to carry on binary-id information from raw profiles.
+
+Temporal Profile Traces
+^^^^^^^^^^^^^^^^^^^^^^^^
+The section to carry on temporal profile information from raw profiles.
+See `Temporal profiling RFC`_ for an overview.
+
+.. _`Temporal profiling RFC`: https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068
+
+Profile Data Usage
+=======================================
+
+`llvm-profdata` is the command line tool to display and process profile data.
+For supported usages, check out its `documentation <https://llvm.org/docs/CommandGuide/llvm-profdata.html>`_.
+
+
+.. [1] A raw profile file could contain multiple raw profiles. Raw profile
+   reader could parse all raw profiles from the file correctly.
+.. [2] The counter section is used by a few variant types (like coverage and
+   temporal profiling) and might have different semantics there.
+.. [3] The step size of data pointer is the `sizeof(ProfileData)`, and the step
+   size of value profile pointer is calcuated based on the number of collected
+   values.
diff --git a/llvm/docs/UserGuides.rst b/llvm/docs/UserGuides.rst
index 006df613bc5e7d..14a2e161ea54cb 100644
--- a/llvm/docs/UserGuides.rst
+++ b/llvm/docs/UserGuides.rst
@@ -58,6 +58,7 @@ intermediate LLVM representation.
    NVPTXUsage
    Phabricator
    Passes
+   PGOProfileFormat
    ReportingGuide
    ResponseGuide
    Remarks
@@ -177,6 +178,9 @@ Optimizations
    referencing, to determine variable locations for debug info in the final
    stages of compilation.
 
+:doc:`PGOProfileFormat`
+   This document explains two binary formats of IRPGO profiles.
+
 Code Generation
 ---------------
 

>From ac0e5507946c1d25ec4c1eb1c10ea4298083c3c7 Mon Sep 17 00:00:00 2001
From: mingmingl <mingmingl at google.com>
Date: Wed, 20 Dec 2023 17:36:28 -0800
Subject: [PATCH 4/9] resolve feedback

---
 llvm/docs/PGOProfileFormat.rst | 148 +++++++++++++++++----------------
 1 file changed, 78 insertions(+), 70 deletions(-)

diff --git a/llvm/docs/PGOProfileFormat.rst b/llvm/docs/PGOProfileFormat.rst
index 5602172e147f00..37b1049d84d761 100644
--- a/llvm/docs/PGOProfileFormat.rst
+++ b/llvm/docs/PGOProfileFormat.rst
@@ -1,44 +1,46 @@
-=====================
-IRPGO Profile Format
-=====================
+===================================
+Instrumentation PGO Profile Format
+===================================
 
 .. contents::
    :local:
 
 
 Overview
-==========
+=========
 
-IR-based instrumentation (IRPGO) and its context-sensitive variant (CS-IRPGO)
-inserts `llvm.instrprof.*` `code generator intrinsics <https://llvm.org/docs/LangRef.html#code-generator-intrinsics>`_
-in LLVM IR to generate profiles. This document describes two binary profile
-formats (raw and indexed) used by IR-based instrumentation.
+Instrumentation PGO inserts `llvm.instrprof.*` `code generator intrinsics`_
+in the code to generate profiles. This document describes two binary profile
+formats (raw and indexed) used by instrumentation.
+
+.. _`code generator intrinsics`: https://llvm.org/docs/LangRef.html#code-generator-intrinsics
 
 .. note::
+  The instrumentation profile format supports non-PGO use cases (e.g., temporal
+  profiling). This document will focus on PGO. Source coverage uses both
+  frontend instrumentation profiles and coverage mapping. The format for
+  coverage mapping has its own `documentation`_.
 
-  Both the compiler-rt profiling infrastructure and profile format are general
-  and could support other use cases (e.g., coverage and temporal profiling).
-  This document will focus on IRPGO while briefly introducing other use cases
-  with pointers.
+.. _`documentation`: https://llvm.org/docs/CoverageMappingFormat.html
 
-Raw PGO Profile Format
-========================
+Raw Profile Format
+===================
 
-The raw PGO profile is generated by running the instrumented binary. It is a
-memory dump of the profile data.
+The raw profile is generated by running the instrumented binary. It is a memory
+dump of the profile data.
 
-Two kinds of frequently used profile information are function's basic block
-counters and its (various flavors of) value profiles. A function's profiled
-information span across several sections in the profile.
+The instrumented binary currently collects two kinds of profile data, counters
+to profile branch probability and (various flavors of) value profiles. The
+profile data for a function span across several sections in the profile.
 
 General Storage Layout
 -----------------------
 
-A raw profile for an executable [1]_ consists of a profile header and several
-sections. The storage layout is illustrated below. Generally, when raw profile
-is read into an memory buffer, the actual byte offset of a section is inferred
-from the section's order in the layout and size information of all sections
-ahead of it.
+A raw profile from an executable or a shared library [1]_ consists of a profile
+header and several sections. The storage layout is illustrated below. Generally,
+when the raw profile is read into an memory buffer, the actual byte offset of a
+section is inferred from the section's order in the layout and size information
+of all the sections ahead of it.
 
 ::
 
@@ -78,12 +80,12 @@ Header
 
 ``Magic``
   With the magic number, data consumer could detect profile format and
-  endianness of the data, and quickly tells whether/how to continue reading.
+  endianness of the data, and tells whether/how to continue reading.
 
 ``Version``
   The lower 32 bits specifies the actual version and the most significant 32
-  bits specify the variant types of the profile. IRPGO and CS-IRPGO are two
-  variant types.
+  bits specify the variant types of the profile. IR-based instrumentation PGO
+  and context-sensitive IR-based instrumentation PGO are two variant types.
 
 ``BinaryIdsSize``
   The byte size of binary id section.
@@ -113,12 +115,12 @@ Header
   Records the in-memory address difference between the data and bitmap section,
   i.e., `start(__llvm_prf_bits) - start(__llvm_prf_data)`. It's used jointly
   with the in-memory address difference of a profile data record and its bitmap
-  to find the bitmap of a profile data record, in a similar to how counters are
-  referenced as explained by calculation-of-counter-offset_ .
+  to find the bitmap of a profile data record, in a similar way to how counters
+  are referenced as explained by calculation-of-counter-offset_ .
 
 ``NamesDelta``
-  Records the in-memory address of compressed name section. Not used except for
-  raw profile reader error checking.
+  Records the in-memory address of name section. Not used except for raw profile
+  reader error checking.
 
 ``ValueKindLast``
   Records the number of value kinds. As of writing, two kinds of value profiles
@@ -148,15 +150,17 @@ are defined by `INSTRPROFDATA` macro. Some fields are used to reference data
 from other sections in the profile. The fields are documented as follows:
 
 ``NameRef``
-  The MD5 of the function's IRPGO name. IRPGO name has the format
-  `[<filepath>;]<linkage-name>` where `<filepath>;` is provided for local-linkage
-  functions to tell possibly identical function names.
+  The MD5 of the function's PGO name. PGO name has the format
+  `[<filepath><delimiter>]<linkage-or-mangled-name>` where `<filepath>` and
+  `<delimiter>` is provided for local-linkage functions to tell possibly
+  identical functions.
 
 ``FuncHash``
   A fingerprint of the function's control flow graph.
 
 ``CounterPtr``
-  The in-memory address difference between profile data and its corresponding counters.
+  The in-memory address difference between profile data and its corresponding
+  counters.
 
 ``BitmapPtr``
   The in-memory address difference between profile data and its bitmap.
@@ -187,9 +191,9 @@ from other sections in the profile. The fields are documented as follows:
 Profile Counters
 ^^^^^^^^^^^^^^^^^
 
-For IRPGO [2]_, the counters within an instrumented function are stored contiguously
-and in an order that is consistent with basic block selection in the instrumentation
-pass.
+For PGO [2]_, the counters within an instrumented function are stored contiguously
+and in an order that is consistent with instrumentation points selection in the
+instrumentation pass.
 
 .. _calculation-of-counter-offset:
 
@@ -235,12 +239,15 @@ In the graph,
   We will call it `CounterDeltaInitVal` below for convenience.
 * For each profile data record, `CounterPtrN` is recorded as `start(Counter) - start(ProfileData)`.
 
-Each time the reader advances to the next data record, it updates `CounterDelta` to minus the size of one `ProfileData`.
+Each time the reader advances to the next data record, it updates `CounterDelta`
+to minus the size of one `ProfileData`.
 
 For the counter corresponding to the first data record, the byte offset
 relative to the start of the counter section is calculated as `CounterPtr1 - CounterDeltaInitVal`.
-When profile reader advances to the second data record, note `CounterDelta` is now `CounterDeltaInitVal - sizeof(ProfileData)`.
-Thus the byte offset relative to the start of the counter section is calculated as `CounterPtr2 - (CounterDeltaInitVal - sizeof(ProfileData))`.
+When profile reader advances to the second data record, note `CounterDelta`
+is updated to `CounterDeltaInitVal - sizeof(ProfileData)`.
+Thus the byte offset relative to the start of the counter section is calculated
+as `CounterPtr2 - (CounterDeltaInitVal - sizeof(ProfileData))`.
 
 Bitmap
 ^^^^^^^
@@ -252,8 +259,8 @@ if interested.
 Names
 ^^^^^^
 
-This section contains the concatenated string of function IRPGO names. If
-compressed, zlib compression algorithm is used.
+This section contains possibly compressed concatenated string of functions' PGO
+names. If compressed, zlib compression algorithm is used.
 
 Function names serve as keys in the PGO data hash table when raw profiles are
 converted into indexed profiles. They are also crucial for `llvm-profdata` to
@@ -289,18 +296,18 @@ General Storage Layout
                     |       +-----------------------+   D
                 +-----------|     MemProfOffset     |   E
                 |   |       +-----------------------+   R
-                |   |       |     BinaryIdOffset    |   |
-                |   |       +-----------------------+   |
+                |   |    +--|     BinaryIdOffset    |   |
+                |   |    |  +-----------------------+   |
             +---------------|      TemporalProf-    |   |
-            |   |   |       |      TracesOffset     |   |
-            |   |   |       +-----------------------+---+
-            |   |   |       |   Profile Summary     |   |
-            |   |   |       +-----------------------+   P
+            |   |   |    |  |      TracesOffset     |   |
+            |   |   |    |  +-----------------------+---+
+            |   |   |    |  |   Profile Summary     |   |
+            |   |   |    |  +-----------------------+   P
             |   |   +------>|  Function PGO data    |   A
-            |   |           +-----------------------+   Y
+            |   |        |  +-----------------------+   Y
             |   +---------- |  MemProf profile data |   L
-            |               +-----------------------+   O
-            |               |    Binary Ids         |   A
+            |            |  +-----------------------+   O
+            |            +--|    Binary Ids         |   A
             |               +-----------------------+   D
             +-------------->|  Temporal profiles    |   |
                             +-----------------------+---+
@@ -322,10 +329,10 @@ Header
   writing.
 
 ``HashOffset``
-  An on-disk hash table stores the per-function profile records.
-  Precisely speaking, `HashOffset` records the offset of this hash table's
-  metadata (i.e., the number of buckets and entries), which follows right after
-  the payload of the entire hash table.
+  An on-disk hash table stores the per-function profile records. It records the
+  offset of this hash table's metadata (i.e., the number of buckets and entries),
+  which follows right after the payload of the entire hash table for
+  deserialization.
 
 ``MemProfOffset``
   Records the byte offset of MemProf profiling data.
@@ -342,16 +349,16 @@ Payload Sections
 (CS) Profile Summary
 ^^^^^^^^^^^^^^^^^^^^^
 This section is right after profile header. It stores the serialized profile
-summary. For context-sensitive IRPGO, this section stores an additional profile
-summary corresponding to the context-sensitive profiles.
+summary. For context-sensitive IR-based instrumentation PGO, this section stores
+an additional profile summary corresponding to the context-sensitive profiles.
 
 Function PGO data
 ^^^^^^^^^^^^^^^^^^
 This section stores functions and their PGO profiling data as an on-disk hash
 table. The key of a hash table entry is function's PGO name, and the in-memory
-representation of value is a map. The key of this map is CFG hash, and the value
-is C++ struct `llvm::InstrProfRecord`. The C++ struct collects the profiling
-information like counters and value profiles.
+representation of value is a map. The key of this map is the fingerprint of CFG,
+and the value is a C++ struct named `llvm::InstrProfRecord`. The C++ struct
+collects the profiling information like counters and value profiles.
 
 MemProf Profile data
 ^^^^^^^^^^^^^^^^^^^^^^
@@ -362,26 +369,27 @@ This section stores function's memory profiling data. See
 
 Binary Ids
 ^^^^^^^^^^^^^^^^^^^^^^
-The section to carry on binary-id information from raw profiles.
+The section is used to carry on binary-id information from raw profiles.
 
 Temporal Profile Traces
 ^^^^^^^^^^^^^^^^^^^^^^^^
-The section to carry on temporal profile information from raw profiles.
-See `Temporal profiling RFC`_ for an overview.
+The section is used to carry on temporal profile information from raw profiles.
+See `Temporal profiling RFC`_ for the design.
 
 .. _`Temporal profiling RFC`: https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068
 
 Profile Data Usage
 =======================================
 
-`llvm-profdata` is the command line tool to display and process profile data.
-For supported usages, check out its `documentation <https://llvm.org/docs/CommandGuide/llvm-profdata.html>`_.
+`llvm-profdata` is the command line tool to display and process instrumentation-
+based profile data. For supported usages, check out `llvm-profdata documentation <https://llvm.org/docs/CommandGuide/llvm-profdata.html>`_.
 
 
-.. [1] A raw profile file could contain multiple raw profiles. Raw profile
-   reader could parse all raw profiles from the file correctly.
-.. [2] The counter section is used by a few variant types (like coverage and
-   temporal profiling) and might have different semantics there.
+.. [1] A raw profile file could contain the concatenation of multiple raw
+   profiles. Raw profile reader could parse all raw profiles from the file
+   correctly.
+.. [2] The counter section is used by a few variant types (like temporal
+   profiling) and might have different semantics there.
 .. [3] The step size of data pointer is the `sizeof(ProfileData)`, and the step
    size of value profile pointer is calcuated based on the number of collected
    values.

>From ba27d13000bd7788fa978650d71c633bb3ef0d2a Mon Sep 17 00:00:00 2001
From: mingmingl <mingmingl at google.com>
Date: Mon, 25 Dec 2023 17:17:49 -0800
Subject: [PATCH 5/9] incorporate feedbacks * Some rewording as suggseted. *
 Add link to code at a specific commit at a few places. * Mention it
 explicitly when fields might have a different semantics in   non-IRPGO case.
 * Mention version compatibility guarantees explicitly for both formats,   and
 add more details on endianness handling for raw profiles. * Add code comment
 to ask for doc update if appropriate.

---
 compiler-rt/include/profile/InstrProfData.inc |   2 +
 llvm/docs/PGOProfileFormat.rst                | 234 +++++++++++-------
 llvm/include/llvm/ProfileData/InstrProf.h     |   4 +-
 .../llvm/ProfileData/InstrProfData.inc        |   2 +
 4 files changed, 155 insertions(+), 87 deletions(-)

diff --git a/compiler-rt/include/profile/InstrProfData.inc b/compiler-rt/include/profile/InstrProfData.inc
index f5de23ff4b94d9..8ca077b72b844b 100644
--- a/compiler-rt/include/profile/InstrProfData.inc
+++ b/compiler-rt/include/profile/InstrProfData.inc
@@ -123,6 +123,8 @@ INSTR_PROF_VALUE_NODE(PtrToNodeT, llvm::PointerType::getUnqual(Ctx), Next, \
 
 /* INSTR_PROF_RAW_HEADER  start */
 /* Definition of member fields of the raw profile header data structure. */
+/* Please update https://llvm.org/docs/InstrProfileFormat.html as appropriate
+   when updating raw profile format. */
 #ifndef INSTR_PROF_RAW_HEADER
 #define INSTR_PROF_RAW_HEADER(Type, Name, Initializer)
 #else
diff --git a/llvm/docs/PGOProfileFormat.rst b/llvm/docs/PGOProfileFormat.rst
index 37b1049d84d761..97808fb95b4948 100644
--- a/llvm/docs/PGOProfileFormat.rst
+++ b/llvm/docs/PGOProfileFormat.rst
@@ -1,5 +1,5 @@
 ===================================
-Instrumentation PGO Profile Format
+Instrumentation Profile Format
 ===================================
 
 .. contents::
@@ -9,35 +9,43 @@ Instrumentation PGO Profile Format
 Overview
 =========
 
-Instrumentation PGO inserts `llvm.instrprof.*` `code generator intrinsics`_
-in the code to generate profiles. This document describes two binary profile
-formats (raw and indexed) used by instrumentation.
-
-.. _`code generator intrinsics`: https://llvm.org/docs/LangRef.html#code-generator-intrinsics
+Clang supports two types of profiling via instrumentation [0]_: frontend-based
+and IR-based, and both could support a variety of use cases [1]_ .
+This document describes two binary serialization formats (raw and indexed) to
+store instrumented profiles with a specific emphasis on IRPGO use case, in the
+sense that when specific header fields and payload sections have different ways
+of interpretation across use cases, the documentation is based on IRPGO.
 
 .. note::
-  The instrumentation profile format supports non-PGO use cases (e.g., temporal
-  profiling). This document will focus on PGO. Source coverage uses both
-  frontend instrumentation profiles and coverage mapping. The format for
-  coverage mapping has its own `documentation`_.
+  Frontend-generated profiles are used together with coverage mapping for
+  `source based code coverage`_. The `coverage mapping format`_ is different from
+  profile format.
 
-.. _`documentation`: https://llvm.org/docs/CoverageMappingFormat.html
+.. _`source based code coverage`: https://clang.llvm.org/docs/SourceBasedCodeCoverage.html
+.. _`coverage mapping format`: https://llvm.org/docs/CoverageMappingFormat.html
 
 Raw Profile Format
 ===================
 
-The raw profile is generated by running the instrumented binary. It is a memory
-dump of the profile data.
+The raw profile is generated by running the instrumented binary. The raw profile
+data from an executable or a shared library [2]_ consists of a header and
+multiple sections, with each section as a memory dump. The profile raw data needs
+to be reasonably compact and fast to generate.
+
+There are no backward or forward version compatiblity guarantees for the raw profile
+format. That is, compilers and tools `require`_ a specific raw profile version
+to parse the profiles.
+
+.. _`require`: https://github.com/llvm/llvm-project/blob/bffdde8b8e5d9a76a47949cd0f574f3ce656e181/llvm/lib/ProfileData/InstrProfReader.cpp#L551-L558
 
-The instrumented binary currently collects two kinds of profile data, counters
-to profile branch probability and (various flavors of) value profiles. The
-profile data for a function span across several sections in the profile.
+To feed profiles back into compilers for an optimized build (e.g., via
+`-fprofile-use` for IR instrumentation), a raw profile must to be converted into
+indexed format.
 
 General Storage Layout
 -----------------------
 
-A raw profile from an executable or a shared library [1]_ consists of a profile
-header and several sections. The storage layout is illustrated below. Generally,
+The storage layout of raw profile data format is illustrated below. Basically,
 when the raw profile is read into an memory buffer, the actual byte offset of a
 section is inferred from the section's order in the layout and size information
 of all the sections ahead of it.
@@ -71,16 +79,22 @@ of all the sections ahead of it.
 
 
 .. note::
-   Sections might be padded to meet platform-specific alignment requirements.
-   For simplicity, header fields and data sections solely for padding purpose
-   are omitted in the data layout graph above and the rest of this document.
+   Sections might be padded to meet specific alignment requirements. For
+   simplicity, header fields and data sections solely for padding purpose are
+   omitted in the data layout graph above and the rest of this document.
 
 Header
 -------
 
 ``Magic``
-  With the magic number, data consumer could detect profile format and
-  endianness of the data, and tells whether/how to continue reading.
+  Magic number encodes profile format (raw, indexed or text). For the raw format,
+  the magic number also encodes the endianness (big or little) and C pointer
+  byte size (32 or 64) of the platform on which the profile is generated.
+
+  A factory method reads the magic number to construct reader properly and returns
+  error upon unrecognized format. Specifically, the factory method and raw profile
+  reader implementation make sure that a raw profile file could be read back on
+  a platform with the opposite endianness and/or the other C pointer byte size.
 
 ``Version``
   The lower 32 bits specifies the actual version and the most significant 32
@@ -91,8 +105,8 @@ Header
   The byte size of binary id section.
 
 ``NumData``
-  The number of per-function profile data control structures. The byte size of
-  profile data section could be computed with this field.
+  The number of profile metadata. The byte size of profile metadata section
+  could be computed with this field.
 
 ``NumCounter``
   The number of entries in the profile counter section. The byte size of counter
@@ -105,31 +119,34 @@ Header
   The number of bytes in the name section.
 
 ``CountersDelta``
-  Records the in-memory address difference between the data and counter section,
-  i.e., `start(__llvm_prf_cnts) - start(__llvm_prf_data)`. It's used jointly
-  with the in-memory address difference of profile data record and its counter
-  to find the counter of a profile data record. Check out calculation-of-counter-offset_
-  for details.
+  In the IRPGO case [3]_, this field records the in-memory address difference
+  between the metadata and counter section in the instrumented binary,
+  i.e., `start(__llvm_prf_cnts) - start(__llvm_prf_data)`.
+
+  It's used jointly with the in-memory address difference of profile metadata
+  record and its counter in the instrumented binary to compute the counter offset
+  relative to `start(__llvm_prf_cnts)`. Check out calculation-of-counter-offset_
+  for a visualized explanation.
 
 ``BitmapDelta``
-  Records the in-memory address difference between the data and bitmap section,
-  i.e., `start(__llvm_prf_bits) - start(__llvm_prf_data)`. It's used jointly
-  with the in-memory address difference of a profile data record and its bitmap
-  to find the bitmap of a profile data record, in a similar way to how counters
-  are referenced as explained by calculation-of-counter-offset_ .
+  In the IRPGO case [3]_, this field records the in-memory address difference
+  between the metadata and bitmap section in the instrumented binary,
+  i.e., `start(__llvm_prf_bits) - start(__llvm_prf_data)`.
+
+  It's used jointly with the in-memory address difference of a profile data record
+  and its bitmap in the instrumented binary to find the bitmap of a profile data
+  record, in a similar way to how counters are referenced as explained by
+  calculation-of-counter-offset_ .
 
 ``NamesDelta``
   Records the in-memory address of name section. Not used except for raw profile
   reader error checking.
 
 ``ValueKindLast``
-  Records the number of value kinds. As of writing, two kinds of value profiles
-  are supported. `IndirectCallTarget` is to profile the frequent callees of
-  indirect call instructions and `MemOPSize` is for memory intrinsic function
-  size profiling.
+  Records the number of value kinds. Macro `VALUE_PROF_KIND`_ defines the value
+  kinds with a description of the kind.
 
-  The number of value kinds affects the byte size of per function profile data
-  control structure.
+.. _`VALUE_PROF_KIND`: https://github.com/llvm/llvm-project/blob/7e405eb722e40c79b7726201d0f76b5dab34ba0f/compiler-rt/include/profile/InstrProfData.inc#L184-L186
 
 Payload Sections
 ------------------
@@ -137,17 +154,21 @@ Payload Sections
 Binary Ids
 ^^^^^^^^^^^
 Stores the binary ids of the instrumented binaries to associate binaries with
-profiles for source code coverage. See `Binary Id RFC`_ for introduction.
+profiles for source code coverage. See `Binary Id RFC`_ for the design.
 
 .. _`Binary Id RFC`: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151154.html
 
-Profile Data
-^^^^^^^^^^^^^
+Profile Metadata
+^^^^^^^^^^^^^^^^^^
+
+This section stores the metadata to map counters and value profiles back to
+instrumented code regions (e.g., LLVM IR for IRPGO).
+
+The in-memory representation of the metadata is `__llvm_profile_data`_.
+Some fields are used to reference data from other sections in the profile.
+The fields are documented as follows:
 
-This section stores per-function profile data control structure. The in-memory
-representation of the control structure is `__llvm_profile_data` and the fields
-are defined by `INSTRPROFDATA` macro. Some fields are used to reference data
-from other sections in the profile. The fields are documented as follows:
+.. _`__llvm_profile_data`: https://github.com/llvm/llvm-project/blob/7c3b67d2038cfb48a80299089f6a1308eee1df7f/compiler-rt/lib/profile/InstrProfiling.h#L25
 
 ``NameRef``
   The MD5 of the function's PGO name. PGO name has the format
@@ -155,15 +176,30 @@ from other sections in the profile. The fields are documented as follows:
   `<delimiter>` is provided for local-linkage functions to tell possibly
   identical functions.
 
+.. _FuncHash:
+
 ``FuncHash``
-  A fingerprint of the function's control flow graph.
+  A checksum of the function's IR, taking control flow graph and instrumented
+  value sites into accounts. See `computeCFGHash`_ for details.
+
+.. _`computeCFGHash`: https://github.com/llvm/llvm-project/blob/7c3b67d2038cfb48a80299089f6a1308eee1df7f/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp#L616-L685
 
 ``CounterPtr``
-  The in-memory address difference between profile data and its corresponding
-  counters.
+  The in-memory address difference between profile data and the start of corresponding
+  counters. Counter position is stored this way (as a link-time constant) to reduce
+  instrumented binary size compared with snapshotting the address of symbols directly.
+  See `commit a1532ed`_ for further information.
+
+.. _`commit a1532ed`: https://github.com/llvm/llvm-project/commit/a1532ed27582038e2d9588108ba0fe8237f01844
+
+  .. note::
+    `CounterPtr` might represent a different value for non-IRPGO use case. For
+    example, for `binary profile correlation`_, it represents the counter address.
+    When in doubt, check source code.
 
 ``BitmapPtr``
-  The in-memory address difference between profile data and its bitmap.
+  The in-memory address difference between profile data and the start address of
+  corresponding bitmap.
 
 ``FunctionPointer``
   Records the function address when instrumented binary runs. This is used to
@@ -178,6 +214,10 @@ from other sections in the profile. The fields are documented as follows:
   `<profiled-value, count>` as payload. This is used by compiler runtime when
   writing out value profiles.
 
+  .. note::
+    Value profiling is supported by frontend and IR PGO instrumentation,
+    but it's not supported in all cases (e.g., `lightweight instrumentation`_).
+
 ``NumCounters``
   The number of counters for the instrumented function.
 
@@ -191,17 +231,16 @@ from other sections in the profile. The fields are documented as follows:
 Profile Counters
 ^^^^^^^^^^^^^^^^^
 
-For PGO [2]_, the counters within an instrumented function are stored contiguously
-and in an order that is consistent with instrumentation points selection in the
-instrumentation pass.
+For PGO [4]_, the counters within an instrumented function of a specific `FuncHash`_
+are stored contiguously and in an order that is consistent with instrumentation points selection.
 
 .. _calculation-of-counter-offset:
 
-So how are function counters associated with a function?
+As mentioned above, the recorded counter offset is relative to the profile metadata.
+So how are function counters associated with the profiled function?
 
-Basically, the profile reader iterates per-function control structure (from the
-profile data section) and makes use of the recorded relative distances, as
-illustrated below.
+Basically, the profile reader iterates profile metadata (from the profile metadata
+section) and makes use of the recorded relative distances, as illustrated below.
 
 ::
 
@@ -239,9 +278,11 @@ In the graph,
   We will call it `CounterDeltaInitVal` below for convenience.
 * For each profile data record, `CounterPtrN` is recorded as `start(Counter) - start(ProfileData)`.
 
-Each time the reader advances to the next data record, it updates `CounterDelta`
+Each time the reader advances to the next data record, it `updates`_ `CounterDelta`
 to minus the size of one `ProfileData`.
 
+.. _`updates`: https://github.com/llvm/llvm-project/blob/17ff25a58ee4f29816d932fdb75f0d305718069f/llvm/include/llvm/ProfileData/InstrProfReader.h#L439-L444
+
 For the counter corresponding to the first data record, the byte offset
 relative to the start of the counter section is calculated as `CounterPtr1 - CounterDeltaInitVal`.
 When profile reader advances to the second data record, note `CounterDelta`
@@ -251,16 +292,17 @@ as `CounterPtr2 - (CounterDeltaInitVal - sizeof(ProfileData))`.
 
 Bitmap
 ^^^^^^^
-This section is used for source-based MC/DC code coverage. Check out `Bitmap RFC`_
-if interested.
+This section is used for source-based `Modified Condition/Decision Coverage`_ code coverage. Check out `Bitmap RFC`_
+for the design.
 
+.. _`Modified Condition/Decision Coverage`: https://en.wikipedia.org/wiki/Modified_condition/decision_coverage
 .. _`Bitmap RFC`: https://discourse.llvm.org/t/rfc-source-based-mc-dc-code-coverage/59244
 
 Names
 ^^^^^^
 
 This section contains possibly compressed concatenated string of functions' PGO
-names. If compressed, zlib compression algorithm is used.
+names. If compressed, zlib library is used.
 
 Function names serve as keys in the PGO data hash table when raw profiles are
 converted into indexed profiles. They are also crucial for `llvm-profdata` to
@@ -271,15 +313,23 @@ Value Profile Data
 
 This section contains the profile data for value profiling.
 
-The value profiles corresponding to a profile data are serialized contiguously
+The value profiles corresponding to a profile metadata are serialized contiguously
 as one record, and value profile records are stored in the same order as the
-respective profile data, such that a raw profile reader advances the pointer to
-profile data and the pointer to value profile records simutaneously [3]_ to find
-value profiles for a per function, per cfg fingerprint profile data.
+respective profile data, such that a raw profile reader `advances`_ the pointer to
+profile data and the pointer to value profile records simutaneously [5]_ to find
+value profiles for a per function, per `FuncHash`_ profile data.
+
+.. _`advances`: https://github.com/llvm/llvm-project/blob/7e15fa9161eda7497a5d6abf0d951a1d12d86550/llvm/include/llvm/ProfileData/InstrProfReader.h#L456-L457
 
 Indexed PGO Profile Format
 ===========================
 
+Indexed profiles are generated from `llvm-profdata`. In the indexed profiles,
+function PGO data are organized as on-disk hash table such that compilers could
+look up PGO data for functions in an IR module.
+
+Compilers and tools must retain backward compatibility with indexed PGO profiles.
+
 General Storage Layout
 -----------------------
 
@@ -316,8 +366,8 @@ Header
 --------
 
 ``Magic``
-  The purpose of the magic number is to be able to quickly tell if the profile
-  is an indexed profile.
+  The purpose of the magic number is to be able to tell if the profile is an
+  indexed profile.
 
 ``Version``
   Similar to raw profile version, the lower 32 bits specifies the version of the
@@ -329,10 +379,9 @@ Header
   writing.
 
 ``HashOffset``
-  An on-disk hash table stores the per-function profile records. It records the
-  offset of this hash table's metadata (i.e., the number of buckets and entries),
-  which follows right after the payload of the entire hash table for
-  deserialization.
+  An on-disk hash table stores the per-function profile records. This field records
+  the offset of this hash table's metadata (i.e., the number of buckets and
+  entries), which follows right after the payload of the entire hash table.
 
 ``MemProfOffset``
   Records the byte offset of MemProf profiling data.
@@ -355,10 +404,13 @@ an additional profile summary corresponding to the context-sensitive profiles.
 Function PGO data
 ^^^^^^^^^^^^^^^^^^
 This section stores functions and their PGO profiling data as an on-disk hash
-table. The key of a hash table entry is function's PGO name, and the in-memory
-representation of value is a map. The key of this map is the fingerprint of CFG,
-and the value is a C++ struct named `llvm::InstrProfRecord`. The C++ struct
-collects the profiling information like counters and value profiles.
+table. Profile data for functions with the same name are grouped together and
+share one hash table entry (the functions may come from different shared libraries
+for instance). The profile data for them are organized as a sequence of key-value
+pair where the key is `FuncHash`_, and the value is profiled information (represented
+by `InstrProfRecord`_) for the function.
+
+.. _`InstrProfRecord`: https://github.com/llvm/llvm-project/blob/7e405eb722e40c79b7726201d0f76b5dab34ba0f/llvm/include/llvm/ProfileData/InstrProf.h#L693
 
 MemProf Profile data
 ^^^^^^^^^^^^^^^^^^^^^^
@@ -374,9 +426,7 @@ The section is used to carry on binary-id information from raw profiles.
 Temporal Profile Traces
 ^^^^^^^^^^^^^^^^^^^^^^^^
 The section is used to carry on temporal profile information from raw profiles.
-See `Temporal profiling RFC`_ for the design.
-
-.. _`Temporal profiling RFC`: https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068
+See `temporal profiling`_ for the design.
 
 Profile Data Usage
 =======================================
@@ -384,12 +434,24 @@ Profile Data Usage
 `llvm-profdata` is the command line tool to display and process instrumentation-
 based profile data. For supported usages, check out `llvm-profdata documentation <https://llvm.org/docs/CommandGuide/llvm-profdata.html>`_.
 
-
-.. [1] A raw profile file could contain the concatenation of multiple raw
-   profiles. Raw profile reader could parse all raw profiles from the file
-   correctly.
-.. [2] The counter section is used by a few variant types (like temporal
+.. [0] For usage, see https://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation
+.. [1] For example, IR-based instrumentation supports `lightweight instrumentation`_
+   and `temporal profiling`_. Frontend instrumentation could support `single-byte counters`_.
+.. [2] A raw profile file could contain the concatenation of multiple raw
+   profiles, for example, from an executable and its shared libraries. Raw
+   profile reader could parse all raw profiles from the file correctly.
+.. [3] Instrumentations might not load the `__llvm_prf_data` object file section
+   in memory or does not generate the profile metadata section in raw profiles.
+   In those cases, `CountersDelta` is not used and other mechanism are used to
+   match counters with instrumented code. See `lightweight instrumentation`_ and
+   `binary profile correlation`_ for examples.
+.. [4] The counter section is used by a few variant types (like temporal
    profiling) and might have different semantics there.
-.. [3] The step size of data pointer is the `sizeof(ProfileData)`, and the step
+.. [5] The step size of data pointer is the `sizeof(ProfileData)`, and the step
    size of value profile pointer is calcuated based on the number of collected
    values.
+
+.. _`lightweight instrumentation`: https://groups.google.com/g/llvm-dev/c/r03Z6JoN7d4
+.. _`temporal profiling`:  https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068
+.. _`single-byte counters`: https://discourse.llvm.org/t/rfc-single-byte-counters-for-source-based-code-coverage/75685
+.. _`binary profile correlation`: https://discourse.llvm.org/t/rfc-add-binary-profile-correlation-to-not-load-profile-metadata-sections-into-memory-at-runtime/74565
diff --git a/llvm/include/llvm/ProfileData/InstrProf.h b/llvm/include/llvm/ProfileData/InstrProf.h
index 36be2e7d869e7b..f47e492f2fcc2b 100644
--- a/llvm/include/llvm/ProfileData/InstrProf.h
+++ b/llvm/include/llvm/ProfileData/InstrProf.h
@@ -1035,7 +1035,9 @@ const HashT HashType = HashT::MD5;
 inline uint64_t ComputeHash(StringRef K) { return ComputeHash(HashType, K); }
 
 // This structure defines the file header of the LLVM profile
-// data file in indexed-format.
+// data file in indexed-format. Please update
+// https://llvm.org/docs/InstrProfileFormat.html as appropriate when updating
+// the indexed profile format.
 struct Header {
   uint64_t Magic;
   uint64_t Version;
diff --git a/llvm/include/llvm/ProfileData/InstrProfData.inc b/llvm/include/llvm/ProfileData/InstrProfData.inc
index f5de23ff4b94d9..8ca077b72b844b 100644
--- a/llvm/include/llvm/ProfileData/InstrProfData.inc
+++ b/llvm/include/llvm/ProfileData/InstrProfData.inc
@@ -123,6 +123,8 @@ INSTR_PROF_VALUE_NODE(PtrToNodeT, llvm::PointerType::getUnqual(Ctx), Next, \
 
 /* INSTR_PROF_RAW_HEADER  start */
 /* Definition of member fields of the raw profile header data structure. */
+/* Please update https://llvm.org/docs/InstrProfileFormat.html as appropriate
+   when updating raw profile format. */
 #ifndef INSTR_PROF_RAW_HEADER
 #define INSTR_PROF_RAW_HEADER(Type, Name, Initializer)
 #else

>From b761ebc3ae48da7838177702fc1d24cda768ddc8 Mon Sep 17 00:00:00 2001
From: mingmingl <mingmingl at google.com>
Date: Mon, 1 Jan 2024 18:29:22 -0800
Subject: [PATCH 6/9] rename file to InstrProfileFormat.rst

---
 llvm/docs/{PGOProfileFormat.rst => InstrProfileFormat.rst} | 0
 llvm/docs/UserGuides.rst                                   | 4 ++--
 2 files changed, 2 insertions(+), 2 deletions(-)
 rename llvm/docs/{PGOProfileFormat.rst => InstrProfileFormat.rst} (100%)

diff --git a/llvm/docs/PGOProfileFormat.rst b/llvm/docs/InstrProfileFormat.rst
similarity index 100%
rename from llvm/docs/PGOProfileFormat.rst
rename to llvm/docs/InstrProfileFormat.rst
diff --git a/llvm/docs/UserGuides.rst b/llvm/docs/UserGuides.rst
index 14a2e161ea54cb..4bbb564d8385b0 100644
--- a/llvm/docs/UserGuides.rst
+++ b/llvm/docs/UserGuides.rst
@@ -58,7 +58,7 @@ intermediate LLVM representation.
    NVPTXUsage
    Phabricator
    Passes
-   PGOProfileFormat
+   InstrProfileFormat
    ReportingGuide
    ResponseGuide
    Remarks
@@ -178,7 +178,7 @@ Optimizations
    referencing, to determine variable locations for debug info in the final
    stages of compilation.
 
-:doc:`PGOProfileFormat`
+:doc:`InstrProfileFormat`
    This document explains two binary formats of IRPGO profiles.
 
 Code Generation

>From 36bc6372f31986d8b4437c5310b3678ca33668cd Mon Sep 17 00:00:00 2001
From: mingmingl <mingmingl at google.com>
Date: Mon, 1 Jan 2024 20:41:04 -0800
Subject: [PATCH 7/9] correct footnote numbering

---
 llvm/docs/InstrProfileFormat.rst | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/llvm/docs/InstrProfileFormat.rst b/llvm/docs/InstrProfileFormat.rst
index 97808fb95b4948..e1700e471dd02c 100644
--- a/llvm/docs/InstrProfileFormat.rst
+++ b/llvm/docs/InstrProfileFormat.rst
@@ -9,8 +9,8 @@ Instrumentation Profile Format
 Overview
 =========
 
-Clang supports two types of profiling via instrumentation [0]_: frontend-based
-and IR-based, and both could support a variety of use cases [1]_ .
+Clang supports two types of profiling via instrumentation [1]_: frontend-based
+and IR-based, and both could support a variety of use cases [2]_ .
 This document describes two binary serialization formats (raw and indexed) to
 store instrumented profiles with a specific emphasis on IRPGO use case, in the
 sense that when specific header fields and payload sections have different ways
@@ -28,7 +28,7 @@ Raw Profile Format
 ===================
 
 The raw profile is generated by running the instrumented binary. The raw profile
-data from an executable or a shared library [2]_ consists of a header and
+data from an executable or a shared library [3]_ consists of a header and
 multiple sections, with each section as a memory dump. The profile raw data needs
 to be reasonably compact and fast to generate.
 
@@ -119,7 +119,7 @@ Header
   The number of bytes in the name section.
 
 ``CountersDelta``
-  In the IRPGO case [3]_, this field records the in-memory address difference
+  In the IRPGO case [4]_, this field records the in-memory address difference
   between the metadata and counter section in the instrumented binary,
   i.e., `start(__llvm_prf_cnts) - start(__llvm_prf_data)`.
 
@@ -129,7 +129,7 @@ Header
   for a visualized explanation.
 
 ``BitmapDelta``
-  In the IRPGO case [3]_, this field records the in-memory address difference
+  In the IRPGO case [4]_, this field records the in-memory address difference
   between the metadata and bitmap section in the instrumented binary,
   i.e., `start(__llvm_prf_bits) - start(__llvm_prf_data)`.
 
@@ -231,7 +231,7 @@ The fields are documented as follows:
 Profile Counters
 ^^^^^^^^^^^^^^^^^
 
-For PGO [4]_, the counters within an instrumented function of a specific `FuncHash`_
+For PGO [5]_, the counters within an instrumented function of a specific `FuncHash`_
 are stored contiguously and in an order that is consistent with instrumentation points selection.
 
 .. _calculation-of-counter-offset:
@@ -316,7 +316,7 @@ This section contains the profile data for value profiling.
 The value profiles corresponding to a profile metadata are serialized contiguously
 as one record, and value profile records are stored in the same order as the
 respective profile data, such that a raw profile reader `advances`_ the pointer to
-profile data and the pointer to value profile records simutaneously [5]_ to find
+profile data and the pointer to value profile records simutaneously _ to find
 value profiles for a per function, per `FuncHash`_ profile data.
 
 .. _`advances`: https://github.com/llvm/llvm-project/blob/7e15fa9161eda7497a5d6abf0d951a1d12d86550/llvm/include/llvm/ProfileData/InstrProfReader.h#L456-L457
@@ -434,20 +434,20 @@ Profile Data Usage
 `llvm-profdata` is the command line tool to display and process instrumentation-
 based profile data. For supported usages, check out `llvm-profdata documentation <https://llvm.org/docs/CommandGuide/llvm-profdata.html>`_.
 
-.. [0] For usage, see https://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation
-.. [1] For example, IR-based instrumentation supports `lightweight instrumentation`_
+.. [1] For usage, see https://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation
+.. [2] For example, IR-based instrumentation supports `lightweight instrumentation`_
    and `temporal profiling`_. Frontend instrumentation could support `single-byte counters`_.
-.. [2] A raw profile file could contain the concatenation of multiple raw
+.. [3] A raw profile file could contain the concatenation of multiple raw
    profiles, for example, from an executable and its shared libraries. Raw
    profile reader could parse all raw profiles from the file correctly.
-.. [3] Instrumentations might not load the `__llvm_prf_data` object file section
+.. [4] Instrumentations might not load the `__llvm_prf_data` object file section
    in memory or does not generate the profile metadata section in raw profiles.
    In those cases, `CountersDelta` is not used and other mechanism are used to
    match counters with instrumented code. See `lightweight instrumentation`_ and
    `binary profile correlation`_ for examples.
-.. [4] The counter section is used by a few variant types (like temporal
+.. [5] The counter section is used by a few variant types (like temporal
    profiling) and might have different semantics there.
-.. [5] The step size of data pointer is the `sizeof(ProfileData)`, and the step
+.. [6] The step size of data pointer is the `sizeof(ProfileData)`, and the step
    size of value profile pointer is calcuated based on the number of collected
    values.
 

>From 8932e9bfd78269d3e03fb39c085d8cd4a7a4e3c4 Mon Sep 17 00:00:00 2001
From: mingmingl <mingmingl at google.com>
Date: Tue, 2 Jan 2024 09:26:12 -0800
Subject: [PATCH 8/9] one-liner fix for one footnote

---
 llvm/docs/InstrProfileFormat.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/llvm/docs/InstrProfileFormat.rst b/llvm/docs/InstrProfileFormat.rst
index e1700e471dd02c..b842d35183e5ee 100644
--- a/llvm/docs/InstrProfileFormat.rst
+++ b/llvm/docs/InstrProfileFormat.rst
@@ -316,7 +316,7 @@ This section contains the profile data for value profiling.
 The value profiles corresponding to a profile metadata are serialized contiguously
 as one record, and value profile records are stored in the same order as the
 respective profile data, such that a raw profile reader `advances`_ the pointer to
-profile data and the pointer to value profile records simutaneously _ to find
+profile data and the pointer to value profile records simutaneously [6]_ to find
 value profiles for a per function, per `FuncHash`_ profile data.
 
 .. _`advances`: https://github.com/llvm/llvm-project/blob/7e15fa9161eda7497a5d6abf0d951a1d12d86550/llvm/include/llvm/ProfileData/InstrProfReader.h#L456-L457

>From c1b2e195887f8c72f5d8450ab6c80f9fb6295ae0 Mon Sep 17 00:00:00 2001
From: mingmingl <mingmingl at google.com>
Date: Tue, 2 Jan 2024 13:01:30 -0800
Subject: [PATCH 9/9] incorporate review feedback

---
 llvm/docs/InstrProfileFormat.rst | 64 ++++++++++++++++++++------------
 1 file changed, 40 insertions(+), 24 deletions(-)

diff --git a/llvm/docs/InstrProfileFormat.rst b/llvm/docs/InstrProfileFormat.rst
index b842d35183e5ee..7794a1c42b007e 100644
--- a/llvm/docs/InstrProfileFormat.rst
+++ b/llvm/docs/InstrProfileFormat.rst
@@ -105,39 +105,47 @@ Header
   The byte size of binary id section.
 
 ``NumData``
-  The number of profile metadata. The byte size of profile metadata section
+  The number of profile metadata. The byte size of `profile metadata`_ section
   could be computed with this field.
 
 ``NumCounter``
-  The number of entries in the profile counter section. The byte size of counter
+  The number of entries in the profile counter section. The byte size of `counter`_
   section could be computed with this field.
 
 ``NumBitmapBytes``
-  The number of bytes in the profile bitmap section.
+  The number of bytes in the profile `bitmap`_ section.
 
 ``NamesSize``
   The number of bytes in the name section.
 
+.. _`CountersDelta`:
+
 ``CountersDelta``
-  In the IRPGO case [4]_, this field records the in-memory address difference
-  between the metadata and counter section in the instrumented binary,
-  i.e., `start(__llvm_prf_cnts) - start(__llvm_prf_data)`.
+  This field records the in-memory address difference between the `profile metadata`_
+  and counter section in the instrumented binary, i.e., `start(__llvm_prf_cnts) - start(__llvm_prf_data)`.
 
-  It's used jointly with the in-memory address difference of profile metadata
-  record and its counter in the instrumented binary to compute the counter offset
+  It's used jointly with the `CounterPtr`_ field to compute the counter offset
   relative to `start(__llvm_prf_cnts)`. Check out calculation-of-counter-offset_
   for a visualized explanation.
 
+  .. note::
+    Instrumentations might not load the `__llvm_prf_data` object file section
+    in memory or does not generate the profile metadata section in raw profiles.
+    In those cases, `CountersDelta` is not used and other mechanism are used to
+    match counters with instrumented code. See `lightweight instrumentation`_ and
+    `binary profile correlation`_ for examples.
+
 ``BitmapDelta``
-  In the IRPGO case [4]_, this field records the in-memory address difference
-  between the metadata and bitmap section in the instrumented binary,
-  i.e., `start(__llvm_prf_bits) - start(__llvm_prf_data)`.
+  This field records the in-memory address difference between the `profile metadata`_
+  and bitmap section in the instrumented binary, i.e., `start(__llvm_prf_bits) - start(__llvm_prf_data)`.
 
-  It's used jointly with the in-memory address difference of a profile data record
-  and its bitmap in the instrumented binary to find the bitmap of a profile data
+  It's used jointly with the `BitmapPtr`_ to find the bitmap of a profile data
   record, in a similar way to how counters are referenced as explained by
   calculation-of-counter-offset_ .
 
+  Similar to `CountersDelta`_ field, this field may not be used in non-PGO variants
+  of profiles.
+
 ``NamesDelta``
   Records the in-memory address of name section. Not used except for raw profile
   reader error checking.
@@ -158,6 +166,8 @@ profiles for source code coverage. See `Binary Id RFC`_ for the design.
 
 .. _`Binary Id RFC`: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151154.html
 
+.. _`profile metadata`:
+
 Profile Metadata
 ^^^^^^^^^^^^^^^^^^
 
@@ -184,6 +194,8 @@ The fields are documented as follows:
 
 .. _`computeCFGHash`: https://github.com/llvm/llvm-project/blob/7c3b67d2038cfb48a80299089f6a1308eee1df7f/llvm/lib/Transforms/Instrumentation/PGOInstrumentation.cpp#L616-L685
 
+.. _`CounterPtr`:
+
 ``CounterPtr``
   The in-memory address difference between profile data and the start of corresponding
   counters. Counter position is stored this way (as a link-time constant) to reduce
@@ -194,13 +206,18 @@ The fields are documented as follows:
 
   .. note::
     `CounterPtr` might represent a different value for non-IRPGO use case. For
-    example, for `binary profile correlation`_, it represents the counter address.
+    example, for `binary profile correlation`_, it represents the absolute address of counter.
     When in doubt, check source code.
 
+.. _`BitmapPtr`:
+
 ``BitmapPtr``
   The in-memory address difference between profile data and the start address of
   corresponding bitmap.
 
+  .. note::
+    Similar to `CounterPtr`_, this field may represent a different value for non-IRPGO use case.
+
 ``FunctionPointer``
   Records the function address when instrumented binary runs. This is used to
   map the profiled callee address of indirect calls to the `NameRef` during
@@ -228,10 +245,12 @@ The fields are documented as follows:
 ``NumBitmapBytes``
   The number of bitmap bytes for the function.
 
+.. _`counter`:
+
 Profile Counters
 ^^^^^^^^^^^^^^^^^
 
-For PGO [5]_, the counters within an instrumented function of a specific `FuncHash`_
+For PGO [4]_, the counters within an instrumented function of a specific `FuncHash`_
 are stored contiguously and in an order that is consistent with instrumentation points selection.
 
 .. _calculation-of-counter-offset:
@@ -239,7 +258,7 @@ are stored contiguously and in an order that is consistent with instrumentation
 As mentioned above, the recorded counter offset is relative to the profile metadata.
 So how are function counters associated with the profiled function?
 
-Basically, the profile reader iterates profile metadata (from the profile metadata
+Basically, the profile reader iterates profile metadata (from the `profile metadata`_
 section) and makes use of the recorded relative distances, as illustrated below.
 
 ::
@@ -290,6 +309,8 @@ is updated to `CounterDeltaInitVal - sizeof(ProfileData)`.
 Thus the byte offset relative to the start of the counter section is calculated
 as `CounterPtr2 - (CounterDeltaInitVal - sizeof(ProfileData))`.
 
+.. _`bitmap`:
+
 Bitmap
 ^^^^^^^
 This section is used for source-based `Modified Condition/Decision Coverage`_ code coverage. Check out `Bitmap RFC`_
@@ -316,7 +337,7 @@ This section contains the profile data for value profiling.
 The value profiles corresponding to a profile metadata are serialized contiguously
 as one record, and value profile records are stored in the same order as the
 respective profile data, such that a raw profile reader `advances`_ the pointer to
-profile data and the pointer to value profile records simutaneously [6]_ to find
+profile data and the pointer to value profile records simutaneously [5]_ to find
 value profiles for a per function, per `FuncHash`_ profile data.
 
 .. _`advances`: https://github.com/llvm/llvm-project/blob/7e15fa9161eda7497a5d6abf0d951a1d12d86550/llvm/include/llvm/ProfileData/InstrProfReader.h#L456-L457
@@ -440,14 +461,9 @@ based profile data. For supported usages, check out `llvm-profdata documentation
 .. [3] A raw profile file could contain the concatenation of multiple raw
    profiles, for example, from an executable and its shared libraries. Raw
    profile reader could parse all raw profiles from the file correctly.
-.. [4] Instrumentations might not load the `__llvm_prf_data` object file section
-   in memory or does not generate the profile metadata section in raw profiles.
-   In those cases, `CountersDelta` is not used and other mechanism are used to
-   match counters with instrumented code. See `lightweight instrumentation`_ and
-   `binary profile correlation`_ for examples.
-.. [5] The counter section is used by a few variant types (like temporal
+.. [4] The counter section is used by a few variant types (like temporal
    profiling) and might have different semantics there.
-.. [6] The step size of data pointer is the `sizeof(ProfileData)`, and the step
+.. [5] The step size of data pointer is the `sizeof(ProfileData)`, and the step
    size of value profile pointer is calcuated based on the number of collected
    values.
 



More information about the llvm-commits mailing list