[llvm] [BOLT][NFC] Expose YAMLProfileWriter::convert function (PR #76909)

Amir Ayupov via llvm-commits llvm-commits at lists.llvm.org
Wed Mar 20 14:35:25 PDT 2024


https://github.com/aaupov updated https://github.com/llvm/llvm-project/pull/76909

>From 7c21d3013c986bb8117ae286526d7ca24d44669f Mon Sep 17 00:00:00 2001
From: Amir Ayupov <aaupov at fb.com>
Date: Wed, 3 Jan 2024 21:25:47 -0800
Subject: [PATCH 1/2] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20ch?=
 =?UTF-8?q?anges=20to=20main=20this=20commit=20is=20based=20on?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Created using spr 1.3.4

[skip ci]
---
 bolt/docs/BAT.md                              | 102 ++++++
 .../bolt/Profile/BoltAddressTranslation.h     |  34 +-
 bolt/include/bolt/Profile/DataAggregator.h    |  14 +-
 bolt/lib/Profile/BoltAddressTranslation.cpp   | 311 +++++++++++++-----
 bolt/lib/Profile/DataAggregator.cpp           |  34 +-
 bolt/lib/Rewrite/RewriteInstance.cpp          |   5 +
 bolt/test/X86/bolt-address-translation.test   |  11 +-
 7 files changed, 412 insertions(+), 99 deletions(-)
 create mode 100644 bolt/docs/BAT.md

diff --git a/bolt/docs/BAT.md b/bolt/docs/BAT.md
new file mode 100644
index 00000000000000..fcb3e70c9e369c
--- /dev/null
+++ b/bolt/docs/BAT.md
@@ -0,0 +1,102 @@
+# BOLT Address Translation (BAT)
+# Purpose
+A regular profile collection for BOLT involves collecting samples from
+unoptimized binary. BOLT Address Translation allows collecting profile
+from BOLT-optimized binary and using it for optimizing the input (pre-BOLT)
+binary.
+
+# Overview
+BOLT Address Translation is an extra section (`.note.bolt_bat`) inserted by BOLT
+into the output binary containing translation tables and split functions linkage
+information. This information enables mapping the profile back from optimized
+binary onto the original binary.
+
+# Usage
+`--enable-bat` flag controls the generation of BAT section. Sampled profile 
+needs to be passed along with the optimized binary containing BAT section to
+`perf2bolt` which reads BAT section and produces fdata profile for the original
+binary. Note that YAML profile generation is not supported since BAT doesn't
+contain the metadata for input functions.
+
+# Internals
+## Section contents
+The section is organized as follows:
+- Hot functions table
+  - Address translation tables
+- Cold functions table
+
+## Construction and parsing
+BAT section is created from `BoltAddressTranslation` class which captures
+address translation information provided by BOLT linker. It is then encoded as a
+note section in the output binary.
+
+During profile conversion when BAT-enabled binary is passed to perf2bolt, 
+`BoltAddressTranslation` class is populated from BAT section. The class is then
+queried by `DataAggregator` during sample processing to reconstruct addresses/
+offsets in the input binary.
+
+## Encoding format
+The encoding is specified in bolt/include/bolt/Profile/BoltAddressTranslation.h
+and bolt/lib/Profile/BoltAddressTranslation.cpp.
+
+### Layout
+The general layout is as follows:
+```
+Hot functions table header
+|------------------|
+|  Function entry  |
+| |--------------| |
+| | OutOff InOff | |
+| |--------------| |
+~~~~~~~~~~~~~~~~~~~~
+
+Cold functions table header
+|------------------|
+|  Function entry  |
+| |--------------| |
+| | OutOff InOff | |
+| |--------------| |
+~~~~~~~~~~~~~~~~~~~~
+```
+
+### Functions table
+Hot and cold functions tables share the encoding except differences marked below.
+Header:
+| Entry  | Encoding | Description |
+| ------ | ----- | ----------- |
+| `NumFuncs` | ULEB128 | Number of functions in the functions table |
+
+The header is followed by Functions table with `NumFuncs` entries.
+Output binary addresses are delta encoded, meaning that only the difference with
+the last previous output address is stored. Addresses implicitly start at zero.
+Output addresses are continuous through function start addresses and function
+internal offsets, and between hot and cold fragments, to better spread deltas
+and save space.
+
+Hot indices are delta encoded, implicitly starting at zero.
+| Entry  | Encoding | Description |
+| ------ | ------| ----------- |
+| `Address` | Continuous, Delta, ULEB128 | Function address in the output binary |
+| `HotIndex` | Delta, ULEB128 | Cold functions only: index of corresponding hot function in hot functions table |
+| `FuncHash` | 8b | Hot functions only: function hash for input function |
+| `NumEntries` | ULEB128 | Number of address translation entries for a function |
+| `EqualElems` | ULEB128 | Hot functions only: number of equal offsets in the beginning of a function |
+| `BranchEntries` | Bitmask, `alignTo(EqualElems, 8)` bits | Hot functions only: if `EqualElems` is non-zero, bitmask denoting entries with `BRANCHENTRY` bit |
+Function header is followed by `EqualElems` offsets (hot functions only) and
+`NumEntries-EqualElems` (`NumEntries` for cold functions) pairs of offsets for
+current function.
+
+### Address translation table
+Delta encoding means that only the difference with the previous corresponding
+entry is encoded. Input offsets implicitly start at zero.
+| Entry  | Encoding | Description |
+| ------ | ------| ----------- |
+| `OutputAddr` | Continuous, Delta, ULEB128 | Function offset in output binary |
+| `InputAddr` | Optional, Delta, SLEB128 | Function offset in input binary with `BRANCHENTRY` LSB bit |
+| `BBHash` | Optional, 8b | Basic block entries only: basic block hash in input binary |
+
+`BRANCHENTRY` bit denotes whether a given offset pair is a control flow source
+(branch or call instruction). If not set, it signifies a control flow target
+(basic block offset).
+`InputAddr` is omitted for equal offsets in input and output function. In this
+case, `BRANCHENTRY` bits are encoded separately in a `BranchEntries` bitvector.
diff --git a/bolt/include/bolt/Profile/BoltAddressTranslation.h b/bolt/include/bolt/Profile/BoltAddressTranslation.h
index 07e4b283211c69..71d3d0f66097d3 100644
--- a/bolt/include/bolt/Profile/BoltAddressTranslation.h
+++ b/bolt/include/bolt/Profile/BoltAddressTranslation.h
@@ -11,6 +11,7 @@
 
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/Support/DataExtractor.h"
 #include <cstdint>
 #include <map>
 #include <optional>
@@ -78,10 +79,21 @@ class BoltAddressTranslation {
 
   BoltAddressTranslation() {}
 
+  /// Write the serialized address translation table for a function.
+  template <bool Cold>
+  void writeMaps(std::map<uint64_t, MapTy> &Maps, uint64_t &PrevAddress,
+                 raw_ostream &OS);
+
   /// Write the serialized address translation tables for each reordered
   /// function
   void write(const BinaryContext &BC, raw_ostream &OS);
 
+  /// Read the serialized address translation table for a function.
+  /// Return a parse error if failed.
+  template <bool Cold>
+  void parseMaps(std::vector<uint64_t> &HotFuncs, uint64_t &PrevAddress,
+                 DataExtractor &DE, uint64_t &Offset, Error &Err);
+
   /// Read the serialized address translation tables and load them internally
   /// in memory. Return a parse error if failed.
   std::error_code parse(StringRef Buf);
@@ -110,22 +122,40 @@ class BoltAddressTranslation {
   /// addresses when aggregating profile
   bool enabledFor(llvm::object::ELFObjectFileBase *InputFile) const;
 
+  /// Save function and basic block hashes used for metadata dump.
+  void saveMetadata(BinaryContext &BC);
+
 private:
   /// Helper to update \p Map by inserting one or more BAT entries reflecting
   /// \p BB for function located at \p FuncAddress. At least one entry will be
   /// emitted for the start of the BB. More entries may be emitted to cover
   /// the location of calls or any instruction that may change control flow.
   void writeEntriesForBB(MapTy &Map, const BinaryBasicBlock &BB,
-                         uint64_t FuncAddress);
+                         uint64_t FuncAddress, uint64_t FuncInputAddress);
+
+  /// Returns the bitmask with set bits corresponding to indices of BRANCHENTRY
+  /// entries in function address translation map.
+  APInt calculateBranchEntriesBitMask(MapTy &Map, size_t EqualElems);
+
+  /// Calculate the number of equal offsets (output = input) in the beginning
+  /// of the function.
+  size_t getNumEqualOffsets(const MapTy &Map) const;
 
   std::map<uint64_t, MapTy> Maps;
+  std::map<uint64_t, MapTy> ColdMaps;
+
+  using BBHashMap = std::unordered_map<uint32_t, size_t>;
+  std::unordered_map<uint64_t, std::pair<size_t, BBHashMap>> FuncHashes;
 
   /// Links outlined cold bocks to their original function
   std::map<uint64_t, uint64_t> ColdPartSource;
 
+  /// Links output address of a main fragment back to input address.
+  std::unordered_map<uint64_t, uint64_t> ReverseMap;
+
   /// Identifies the address of a control-flow changing instructions in a
   /// translation map entry
-  const static uint32_t BRANCHENTRY = 0x80000000;
+  const static uint32_t BRANCHENTRY = 0x1;
 };
 } // namespace bolt
 
diff --git a/bolt/include/bolt/Profile/DataAggregator.h b/bolt/include/bolt/Profile/DataAggregator.h
index 5bb4d00024c503..e52e0dbb354a68 100644
--- a/bolt/include/bolt/Profile/DataAggregator.h
+++ b/bolt/include/bolt/Profile/DataAggregator.h
@@ -225,6 +225,10 @@ class DataAggregator : public DataReader {
   /// Aggregation statistics
   uint64_t NumInvalidTraces{0};
   uint64_t NumLongRangeTraces{0};
+  /// Specifies how many samples were recorded in cold areas if we are dealing
+  /// with profiling data collected in a bolted binary. For LBRs, incremented
+  /// for the source of the branch to avoid counting cold activity twice (one
+  /// for source and another for destination).
   uint64_t NumColdSamples{0};
 
   /// Looks into system PATH for Linux Perf and set up the aggregator to use it
@@ -246,13 +250,9 @@ class DataAggregator : public DataReader {
   BinaryFunction *getBinaryFunctionContainingAddress(uint64_t Address) const;
 
   /// Retrieve the location name to be used for samples recorded in \p Func.
-  /// If doing BAT translation, link cold parts to the hot part  names (used by
-  /// the original binary).  \p Count specifies how many samples were recorded
-  /// at that location, so we can tally total activity in cold areas if we are
-  /// dealing with profiling data collected in a bolted binary. For LBRs,
-  /// \p Count should only be used for the source of the branch to avoid
-  /// counting cold activity twice (one for source and another for destination).
-  StringRef getLocationName(BinaryFunction &Func, uint64_t Count);
+  /// If doing BAT translation, link cold parts to the hot part names (used by
+  /// the original binary) and return true as second member.
+  std::pair<StringRef, bool> getLocationName(const BinaryFunction &Func) const;
 
   /// Semantic actions - parser hooks to interpret parsed perf samples
   /// Register a sample (non-LBR mode), i.e. a new hit at \p Address
diff --git a/bolt/lib/Profile/BoltAddressTranslation.cpp b/bolt/lib/Profile/BoltAddressTranslation.cpp
index e004309e0e2136..42031e1098e203 100644
--- a/bolt/lib/Profile/BoltAddressTranslation.cpp
+++ b/bolt/lib/Profile/BoltAddressTranslation.cpp
@@ -8,8 +8,10 @@
 
 #include "bolt/Profile/BoltAddressTranslation.h"
 #include "bolt/Core/BinaryFunction.h"
-#include "llvm/Support/DataExtractor.h"
+#include "llvm/ADT/APInt.h"
 #include "llvm/Support/Errc.h"
+#include "llvm/Support/Error.h"
+#include "llvm/Support/LEB128.h"
 
 #define DEBUG_TYPE "bolt-bat"
 
@@ -20,7 +22,8 @@ const char *BoltAddressTranslation::SECTION_NAME = ".note.bolt_bat";
 
 void BoltAddressTranslation::writeEntriesForBB(MapTy &Map,
                                                const BinaryBasicBlock &BB,
-                                               uint64_t FuncAddress) {
+                                               uint64_t FuncAddress,
+                                               uint64_t FuncInputAddress) {
   const uint64_t BBOutputOffset =
       BB.getOutputAddressRange().first - FuncAddress;
   const uint32_t BBInputOffset = BB.getInputOffset();
@@ -34,9 +37,12 @@ void BoltAddressTranslation::writeEntriesForBB(MapTy &Map,
   if (BBInputOffset == BinaryBasicBlock::INVALID_OFFSET)
     return;
 
-  LLVM_DEBUG(dbgs() << "BB " << BB.getName() << "\n");
-  LLVM_DEBUG(dbgs() << "  Key: " << Twine::utohexstr(BBOutputOffset)
-                    << " Val: " << Twine::utohexstr(BBInputOffset) << "\n");
+  LLVM_DEBUG(dbgs() << "BB " << BB.getName() << "\n"
+                    << "  Key: " << Twine::utohexstr(BBOutputOffset)
+                    << " Val: " << Twine::utohexstr(BBInputOffset) << " Hash: "
+                    << Twine::utohexstr(
+                           FuncHashes[FuncInputAddress].second[BBInputOffset])
+                    << '\n';);
   // In case of conflicts (same Key mapping to different Vals), the last
   // update takes precedence. Of course it is not ideal to have conflicts and
   // those happen when we have an empty BB that either contained only
@@ -44,7 +50,7 @@ void BoltAddressTranslation::writeEntriesForBB(MapTy &Map,
   // and this deleted block will both share the same output address (the same
   // key), and we need to map back. We choose here to privilege the successor by
   // allowing it to overwrite the previously inserted key in the map.
-  Map[BBOutputOffset] = BBInputOffset;
+  Map[BBOutputOffset] = BBInputOffset << 1;
 
   const auto &IOAddressMap =
       BB.getFunction()->getBinaryContext().getIOAddressMap();
@@ -61,8 +67,8 @@ void BoltAddressTranslation::writeEntriesForBB(MapTy &Map,
 
     LLVM_DEBUG(dbgs() << "  Key: " << Twine::utohexstr(OutputOffset) << " Val: "
                       << Twine::utohexstr(InputOffset) << " (branch)\n");
-    Map.insert(
-        std::pair<uint32_t, uint32_t>(OutputOffset, InputOffset | BRANCHENTRY));
+    Map.insert(std::pair<uint32_t, uint32_t>(OutputOffset,
+                                             (InputOffset << 1) | BRANCHENTRY));
   }
 }
 
@@ -75,15 +81,21 @@ void BoltAddressTranslation::write(const BinaryContext &BC, raw_ostream &OS) {
     if (Function.isIgnored() || (!BC.HasRelocations && !Function.isSimple()))
       continue;
 
-    LLVM_DEBUG(dbgs() << "Function name: " << Function.getPrintName() << "\n");
-    LLVM_DEBUG(dbgs() << " Address reference: 0x"
-                      << Twine::utohexstr(Function.getOutputAddress()) << "\n");
+    LLVM_DEBUG(
+        dbgs() << "Function name: " << Function.getPrintName() << "\n"
+               << " Address reference: 0x"
+               << Twine::utohexstr(Function.getOutputAddress()) << "\n"
+               << " Hash: 0x"
+               << Twine::utohexstr(FuncHashes[Function.getAddress()].first)
+               << '\n');
 
     MapTy Map;
     for (const BinaryBasicBlock *const BB :
          Function.getLayout().getMainFragment())
-      writeEntriesForBB(Map, *BB, Function.getOutputAddress());
+      writeEntriesForBB(Map, *BB, Function.getOutputAddress(),
+                        Function.getAddress());
     Maps.emplace(Function.getOutputAddress(), std::move(Map));
+    ReverseMap.emplace(Function.getOutputAddress(), Function.getAddress());
 
     if (!Function.isSplit())
       continue;
@@ -94,43 +106,125 @@ void BoltAddressTranslation::write(const BinaryContext &BC, raw_ostream &OS) {
          Function.getLayout().getSplitFragments()) {
       Map.clear();
       for (const BinaryBasicBlock *const BB : FF)
-        writeEntriesForBB(Map, *BB, FF.getAddress());
+        writeEntriesForBB(Map, *BB, FF.getAddress(), Function.getAddress());
 
-      Maps.emplace(FF.getAddress(), std::move(Map));
+      ColdMaps.emplace(FF.getAddress(), std::move(Map));
       ColdPartSource.emplace(FF.getAddress(), Function.getOutputAddress());
     }
   }
 
+  // Output addresses are delta-encoded
+  uint64_t PrevAddress = 0;
+  writeMaps</*Cold=*/false>(Maps, PrevAddress, OS);
+  writeMaps</*Cold=*/true>(ColdMaps, PrevAddress, OS);
+
+  outs() << "BOLT-INFO: Wrote " << Maps.size() + ColdMaps.size()
+         << " BAT maps\n";
+  outs() << "BOLT-INFO: Wrote " << FuncHashes.size() << " function and "
+         << std::accumulate(FuncHashes.begin(), FuncHashes.end(), 0ull,
+                            [](size_t Acc, const auto &B) {
+                              return Acc + B.second.second.size();
+                            })
+         << " basic block hashes\n";
+}
+
+APInt BoltAddressTranslation::calculateBranchEntriesBitMask(MapTy &Map,
+                                                            size_t EqualElems) {
+  APInt BitMask(alignTo(EqualElems, 8), 0);
+  size_t Index = 0;
+  for (std::pair<const uint32_t, uint32_t> &KeyVal : Map) {
+    if (Index == EqualElems)
+      break;
+    const uint32_t OutputOffset = KeyVal.second;
+    if (OutputOffset & BRANCHENTRY)
+      BitMask.setBit(Index);
+    ++Index;
+  }
+  return BitMask;
+}
+
+size_t BoltAddressTranslation::getNumEqualOffsets(const MapTy& Map) const {
+  size_t EqualOffsets = 0;
+  for (const std::pair<const uint32_t, uint32_t> &KeyVal : Map) {
+    const uint32_t OutputOffset = KeyVal.first;
+    const uint32_t InputOffset = KeyVal.second >> 1;
+    if (OutputOffset == InputOffset)
+      ++EqualOffsets;
+    else
+      break;
+  }
+  return EqualOffsets;
+}
+
+template <bool Cold>
+void BoltAddressTranslation::writeMaps(std::map<uint64_t, MapTy> &Maps,
+                                       uint64_t &PrevAddress, raw_ostream &OS) {
   const uint32_t NumFuncs = Maps.size();
-  OS.write(reinterpret_cast<const char *>(&NumFuncs), 4);
-  LLVM_DEBUG(dbgs() << "Writing " << NumFuncs << " functions for BAT.\n");
+  encodeULEB128(NumFuncs, OS);
+  LLVM_DEBUG(dbgs() << "Writing " << NumFuncs << (Cold ? " cold" : "")
+                    << " functions for BAT.\n");
+  size_t PrevIndex = 0;
   for (auto &MapEntry : Maps) {
     const uint64_t Address = MapEntry.first;
+    const uint64_t HotInputAddress =
+        ReverseMap[Cold ? ColdPartSource[Address] : Address];
     MapTy &Map = MapEntry.second;
     const uint32_t NumEntries = Map.size();
     LLVM_DEBUG(dbgs() << "Writing " << NumEntries << " entries for 0x"
                       << Twine::utohexstr(Address) << ".\n");
-    OS.write(reinterpret_cast<const char *>(&Address), 8);
-    OS.write(reinterpret_cast<const char *>(&NumEntries), 4);
+    std::pair<size_t, BBHashMap> &FuncHashPair = FuncHashes[HotInputAddress];
+    if (!Cold)
+      LLVM_DEBUG(dbgs() << "Hash: " << formatv("{0:x}\n", FuncHashPair.first));
+    encodeULEB128(Address - PrevAddress, OS);
+    PrevAddress = Address;
+    if (Cold) {
+      size_t HotIndex =
+          std::distance(ColdPartSource.begin(), ColdPartSource.find(Address));
+      encodeULEB128(HotIndex - PrevIndex, OS);
+      PrevIndex = HotIndex;
+    } else {
+      // Function hash
+      OS.write(reinterpret_cast<char *>(&FuncHashPair.first), 8);
+    }
+    encodeULEB128(NumEntries, OS);
+    // For hot fragments only: encode the number of equal offsets
+    // (output = input) in the beginning of the function. Only encode one offset
+    // in these cases.
+    const size_t EqualElems = Cold ? 0 : getNumEqualOffsets(Map);
+    if (!Cold) {
+      encodeULEB128(EqualElems, OS);
+      if (EqualElems) {
+        const size_t BranchEntriesBytes = alignTo(EqualElems, 8) / 8;
+        APInt BranchEntries = calculateBranchEntriesBitMask(Map, EqualElems);
+        OS.write(reinterpret_cast<const char *>(BranchEntries.getRawData()),
+                 BranchEntriesBytes);
+        LLVM_DEBUG({
+          dbgs() << "BranchEntries: ";
+          SmallString<8> BitMaskStr;
+          BranchEntries.toString(BitMaskStr, 2, false);
+          dbgs() << BitMaskStr << '\n';
+        });
+      }
+    }
+    size_t Index = 0;
+    uint64_t InOffset = 0;
+    // Output and Input addresses and delta-encoded
     for (std::pair<const uint32_t, uint32_t> &KeyVal : Map) {
-      OS.write(reinterpret_cast<const char *>(&KeyVal.first), 4);
-      OS.write(reinterpret_cast<const char *>(&KeyVal.second), 4);
+      const uint64_t OutputAddress = KeyVal.first + Address;
+      encodeULEB128(OutputAddress - PrevAddress, OS);
+      PrevAddress = OutputAddress;
+      if (Index++ >= EqualElems)
+        encodeSLEB128(KeyVal.second - InOffset, OS);
+      InOffset = KeyVal.second; // Keeping InOffset as if BRANCHENTRY is encoded
+      if ((InOffset & BRANCHENTRY) == 0) {
+        // Basic block hash
+        size_t BBHash = FuncHashPair.second[InOffset >> 1];
+        OS.write(reinterpret_cast<char *>(&BBHash), 8);
+        LLVM_DEBUG(dbgs() << formatv("{0:x} -> {1:x} {2:x}\n", KeyVal.first,
+                                     InOffset >> 1, BBHash));
+      }
     }
   }
-  const uint32_t NumColdEntries = ColdPartSource.size();
-  LLVM_DEBUG(dbgs() << "Writing " << NumColdEntries
-                    << " cold part mappings.\n");
-  OS.write(reinterpret_cast<const char *>(&NumColdEntries), 4);
-  for (std::pair<const uint64_t, uint64_t> &ColdEntry : ColdPartSource) {
-    OS.write(reinterpret_cast<const char *>(&ColdEntry.first), 8);
-    OS.write(reinterpret_cast<const char *>(&ColdEntry.second), 8);
-    LLVM_DEBUG(dbgs() << " " << Twine::utohexstr(ColdEntry.first) << " -> "
-                      << Twine::utohexstr(ColdEntry.second) << "\n");
-  }
-
-  outs() << "BOLT-INFO: Wrote " << Maps.size() << " BAT maps\n";
-  outs() << "BOLT-INFO: Wrote " << NumColdEntries
-         << " BAT cold-to-hot entries\n";
 }
 
 std::error_code BoltAddressTranslation::parse(StringRef Buf) {
@@ -152,68 +246,123 @@ std::error_code BoltAddressTranslation::parse(StringRef Buf) {
   if (Name.substr(0, 4) != "BOLT")
     return make_error_code(llvm::errc::io_error);
 
-  if (Buf.size() - Offset < 4)
-    return make_error_code(llvm::errc::io_error);
+  Error Err(Error::success());
+  std::vector<uint64_t> HotFuncs;
+  uint64_t PrevAddress = 0;
+  parseMaps</*Cold=*/false>(HotFuncs, PrevAddress, DE, Offset, Err);
+  parseMaps</*Cold=*/true>(HotFuncs, PrevAddress, DE, Offset, Err);
+  outs() << "BOLT-INFO: Parsed " << Maps.size() << " BAT entries\n";
+  return errorToErrorCode(std::move(Err));
+}
 
-  const uint32_t NumFunctions = DE.getU32(&Offset);
-  LLVM_DEBUG(dbgs() << "Parsing " << NumFunctions << " functions\n");
+template <bool Cold>
+void BoltAddressTranslation::parseMaps(std::vector<uint64_t> &HotFuncs,
+                                       uint64_t &PrevAddress, DataExtractor &DE,
+                                       uint64_t &Offset, Error &Err) {
+  const uint32_t NumFunctions = DE.getULEB128(&Offset, &Err);
+  LLVM_DEBUG(dbgs() << "Parsing " << NumFunctions << (Cold ? " cold" : "")
+                    << " functions\n");
+  size_t HotIndex = 0;
   for (uint32_t I = 0; I < NumFunctions; ++I) {
-    if (Buf.size() - Offset < 12)
-      return make_error_code(llvm::errc::io_error);
-
-    const uint64_t Address = DE.getU64(&Offset);
-    const uint32_t NumEntries = DE.getU32(&Offset);
+    const uint64_t Address = PrevAddress + DE.getULEB128(&Offset, &Err);
+    uint64_t HotAddress = Cold ? 0 : Address;
+    PrevAddress = Address;
+    if (Cold) {
+      HotIndex += DE.getULEB128(&Offset, &Err);
+      HotAddress = HotFuncs[HotIndex];
+      ColdPartSource.emplace(Address, HotAddress);
+    } else {
+      HotFuncs.push_back(Address);
+      // Function hash
+      const size_t FuncHash = DE.getU64(&Offset, &Err);
+      FuncHashes[Address].first = FuncHash;
+      LLVM_DEBUG(dbgs() << formatv("{0:x}: hash {1:x}\n", Address, FuncHash));
+    }
+    const uint32_t NumEntries = DE.getULEB128(&Offset, &Err);
+    // Equal offsets, hot fragments only.
+    size_t EqualElems = 0;
+    APInt *BEBitMask(nullptr);
+    if (!Cold) {
+      EqualElems = DE.getULEB128(&Offset, &Err);
+      LLVM_DEBUG(dbgs() << formatv("Equal offsets: {0}, {1} bytes\n",
+                                   EqualElems, getULEB128Size(EqualElems)));
+      if (EqualElems) {
+        const size_t BranchEntriesBytes = alignTo(EqualElems, 8) / 8;
+        BEBitMask = new APInt(alignTo(EqualElems, 8), 0);
+        LoadIntFromMemory(
+            *BEBitMask,
+            reinterpret_cast<const uint8_t *>(
+                DE.getBytes(&Offset, BranchEntriesBytes, &Err).data()),
+            BranchEntriesBytes);
+        LLVM_DEBUG({
+          dbgs() << "BEBitMask: ";
+          SmallString<8> BitMaskStr;
+          BEBitMask->toString(BitMaskStr, 2, false);
+          dbgs() << BitMaskStr << ", " << BranchEntriesBytes << " bytes\n";
+        });
+      }
+    }
     MapTy Map;
 
     LLVM_DEBUG(dbgs() << "Parsing " << NumEntries << " entries for 0x"
                       << Twine::utohexstr(Address) << "\n");
-    if (Buf.size() - Offset < 8 * NumEntries)
-      return make_error_code(llvm::errc::io_error);
+    uint64_t InputOffset = 0;
     for (uint32_t J = 0; J < NumEntries; ++J) {
-      const uint32_t OutputAddr = DE.getU32(&Offset);
-      const uint32_t InputAddr = DE.getU32(&Offset);
-      Map.insert(std::pair<uint32_t, uint32_t>(OutputAddr, InputAddr));
-      LLVM_DEBUG(dbgs() << Twine::utohexstr(OutputAddr) << " -> "
-                        << Twine::utohexstr(InputAddr) << "\n");
+      const uint64_t OutputDelta = DE.getULEB128(&Offset, &Err);
+      const uint64_t OutputAddress = PrevAddress + OutputDelta;
+      const uint64_t OutputOffset = OutputAddress - Address;
+      PrevAddress = OutputAddress;
+      int64_t InputDelta = 0;
+      if (J < EqualElems) {
+        InputOffset = (OutputOffset << 1) | (*BEBitMask)[J];
+      } else {
+        InputDelta = DE.getSLEB128(&Offset, &Err);
+        InputOffset += InputDelta;
+      }
+      Map.insert(std::pair<uint32_t, uint32_t>(OutputOffset, InputOffset));
+      size_t BBHash = 0;
+      if ((InputOffset & BRANCHENTRY) == 0)
+        // Map basic block hash to hot fragment by input offset
+        BBHash = FuncHashes[HotAddress].second[InputOffset >> 1] =
+            DE.getU64(&Offset, &Err);
+      LLVM_DEBUG({
+        dbgs() << formatv(
+            "{0:x} -> {1:x} ({2}/{3}b -> {4}/{5}b), {6:x}", OutputOffset,
+            InputOffset, OutputDelta, getULEB128Size(OutputDelta), InputDelta,
+            (J < EqualElems) ? 0 : getSLEB128Size(InputDelta), OutputAddress);
+        if (BBHash)
+          dbgs() << formatv(" {0:x}", BBHash);
+        dbgs() << '\n';
+      });
     }
+    if (BEBitMask)
+      delete BEBitMask;
     Maps.insert(std::pair<uint64_t, MapTy>(Address, Map));
   }
-
-  if (Buf.size() - Offset < 4)
-    return make_error_code(llvm::errc::io_error);
-
-  const uint32_t NumColdEntries = DE.getU32(&Offset);
-  LLVM_DEBUG(dbgs() << "Parsing " << NumColdEntries << " cold part mappings\n");
-  for (uint32_t I = 0; I < NumColdEntries; ++I) {
-    if (Buf.size() - Offset < 16)
-      return make_error_code(llvm::errc::io_error);
-    const uint32_t ColdAddress = DE.getU64(&Offset);
-    const uint32_t HotAddress = DE.getU64(&Offset);
-    ColdPartSource.insert(
-        std::pair<uint64_t, uint64_t>(ColdAddress, HotAddress));
-    LLVM_DEBUG(dbgs() << Twine::utohexstr(ColdAddress) << " -> "
-                      << Twine::utohexstr(HotAddress) << "\n");
-  }
-  outs() << "BOLT-INFO: Parsed " << Maps.size() << " BAT entries\n";
-  outs() << "BOLT-INFO: Parsed " << NumColdEntries
-         << " BAT cold-to-hot entries\n";
-
-  return std::error_code();
 }
 
 void BoltAddressTranslation::dump(raw_ostream &OS) {
   const size_t NumTables = Maps.size();
   OS << "BAT tables for " << NumTables << " functions:\n";
   for (const auto &MapEntry : Maps) {
-    OS << "Function Address: 0x" << Twine::utohexstr(MapEntry.first) << "\n";
+    const uint64_t Address = MapEntry.first;
+    const uint64_t HotAddress = fetchParentAddress(Address);
+    OS << "Function Address: 0x" << Twine::utohexstr(Address);
+    if (HotAddress == 0)
+      OS << ", hash: " << Twine::utohexstr(FuncHashes[Address].first);
+    OS << "\n";
     OS << "BB mappings:\n";
     for (const auto &Entry : MapEntry.second) {
       const bool IsBranch = Entry.second & BRANCHENTRY;
-      const uint32_t Val = Entry.second & ~BRANCHENTRY;
+      const uint32_t Val = Entry.second >> 1; // dropping BRANCHENTRY bit
       OS << "0x" << Twine::utohexstr(Entry.first) << " -> "
          << "0x" << Twine::utohexstr(Val);
       if (IsBranch)
         OS << " (branch)";
+      else
+        OS << " hash: "
+           << Twine::utohexstr(
+                  FuncHashes[HotAddress ? HotAddress : Address].second[Val]);
       OS << "\n";
     }
     OS << "\n";
@@ -244,7 +393,7 @@ uint64_t BoltAddressTranslation::translate(uint64_t FuncAddress,
 
   --KeyVal;
 
-  const uint32_t Val = KeyVal->second & ~BRANCHENTRY;
+  const uint32_t Val = KeyVal->second >> 1; // dropping BRANCHENTRY bit
   // Branch source addresses are translated to the first instruction of the
   // source BB to avoid accounting for modifications BOLT may have made in the
   // BB regarding deletion/addition of instructions.
@@ -326,5 +475,19 @@ bool BoltAddressTranslation::enabledFor(
   }
   return false;
 }
+
+void BoltAddressTranslation::saveMetadata(BinaryContext &BC) {
+  for (BinaryFunction &BF : llvm::make_second_range(BC.getBinaryFunctions())) {
+    // We don't need a translation table if the body of the function hasn't
+    // changed
+    if (BF.isIgnored() || (!BC.HasRelocations && !BF.isSimple()))
+      continue;
+    // Prepare function and block hashes
+    FuncHashes[BF.getAddress()].first = BF.computeHash();
+    BF.computeBlockHashes();
+    for (const BinaryBasicBlock &BB : BF)
+      FuncHashes[BF.getAddress()].second.emplace(BB.getInputOffset(), BB.getHash());
+  }
+}
 } // namespace bolt
 } // namespace llvm
diff --git a/bolt/lib/Profile/DataAggregator.cpp b/bolt/lib/Profile/DataAggregator.cpp
index be1e348b338f0f..94e33150b1bc3b 100644
--- a/bolt/lib/Profile/DataAggregator.cpp
+++ b/bolt/lib/Profile/DataAggregator.cpp
@@ -649,17 +649,19 @@ DataAggregator::getBinaryFunctionContainingAddress(uint64_t Address) const {
                                                 /*UseMaxSize=*/true);
 }
 
-StringRef DataAggregator::getLocationName(BinaryFunction &Func,
-                                          uint64_t Count) {
+std::pair<StringRef, bool>
+DataAggregator::getLocationName(const BinaryFunction &Func) const {
   if (!BAT)
-    return Func.getOneName();
+    return {Func.getOneName(), false};
 
+  bool LinkedToHot = false;
   const BinaryFunction *OrigFunc = &Func;
   if (const uint64_t HotAddr = BAT->fetchParentAddress(Func.getAddress())) {
-    NumColdSamples += Count;
     BinaryFunction *HotFunc = getBinaryFunctionContainingAddress(HotAddr);
-    if (HotFunc)
+    if (HotFunc) {
       OrigFunc = HotFunc;
+      LinkedToHot = true;
+    }
   }
   // If it is a local function, prefer the name containing the file name where
   // the local function was declared
@@ -670,9 +672,9 @@ StringRef DataAggregator::getLocationName(BinaryFunction &Func,
     if (FileNameIdx == StringRef::npos ||
         AlternativeName.find('/', FileNameIdx + 1) == StringRef::npos)
       continue;
-    return AlternativeName;
+    return {AlternativeName, LinkedToHot};
   }
-  return OrigFunc->getOneName();
+  return {OrigFunc->getOneName(), LinkedToHot};
 }
 
 bool DataAggregator::doSample(BinaryFunction &Func, uint64_t Address,
@@ -680,7 +682,11 @@ bool DataAggregator::doSample(BinaryFunction &Func, uint64_t Address,
   auto I = NamesToSamples.find(Func.getOneName());
   if (I == NamesToSamples.end()) {
     bool Success;
-    StringRef LocName = getLocationName(Func, Count);
+    bool LinkedToHot;
+    StringRef LocName;
+    std::tie(LocName, LinkedToHot) = getLocationName(Func);
+    if (LinkedToHot)
+      NumColdSamples += Count;
     std::tie(I, Success) = NamesToSamples.insert(
         std::make_pair(Func.getOneName(),
                        FuncSampleData(LocName, FuncSampleData::ContainerTy())));
@@ -700,7 +706,10 @@ bool DataAggregator::doIntraBranch(BinaryFunction &Func, uint64_t From,
   FuncBranchData *AggrData = getBranchData(Func);
   if (!AggrData) {
     AggrData = &NamesToBranches[Func.getOneName()];
-    AggrData->Name = getLocationName(Func, Count);
+    bool LinkedToHot;
+    std::tie(AggrData->Name, LinkedToHot) = getLocationName(Func);
+    if (LinkedToHot)
+      NumColdSamples += Count;
     setBranchData(Func, AggrData);
   }
 
@@ -729,7 +738,10 @@ bool DataAggregator::doInterBranch(BinaryFunction *FromFunc,
   StringRef SrcFunc;
   StringRef DstFunc;
   if (FromFunc) {
-    SrcFunc = getLocationName(*FromFunc, Count);
+    bool LinkedToHot;
+    std::tie(SrcFunc, LinkedToHot) = getLocationName(*FromFunc);
+    if (LinkedToHot)
+      NumColdSamples += Count;
     FromAggrData = getBranchData(*FromFunc);
     if (!FromAggrData) {
       FromAggrData = &NamesToBranches[FromFunc->getOneName()];
@@ -743,7 +755,7 @@ bool DataAggregator::doInterBranch(BinaryFunction *FromFunc,
     recordExit(*FromFunc, From, Mispreds, Count);
   }
   if (ToFunc) {
-    DstFunc = getLocationName(*ToFunc, 0);
+    std::tie(DstFunc, std::ignore) = getLocationName(*ToFunc);
     ToAggrData = getBranchData(*ToFunc);
     if (!ToAggrData) {
       ToAggrData = &NamesToBranches[ToFunc->getOneName()];
diff --git a/bolt/lib/Rewrite/RewriteInstance.cpp b/bolt/lib/Rewrite/RewriteInstance.cpp
index a95b1650753cfd..3b0a8daca7cf7f 100644
--- a/bolt/lib/Rewrite/RewriteInstance.cpp
+++ b/bolt/lib/Rewrite/RewriteInstance.cpp
@@ -731,6 +731,10 @@ Error RewriteInstance::run() {
 
   processProfileData();
 
+  // Save input binary metadata if BAT section needs to be emitted
+  if (opts::EnableBAT)
+    BAT->saveMetadata(*BC);
+
   postProcessFunctions();
 
   processMetadataPostCFG();
@@ -4112,6 +4116,7 @@ void RewriteInstance::encodeBATSection() {
                                   copyByteArray(BoltInfo), BoltInfo.size(),
                                   /*Alignment=*/1,
                                   /*IsReadOnly=*/true, ELF::SHT_NOTE);
+  outs() << "BOLT-INFO: BAT section size (bytes): " << BoltInfo.size() << '\n';
 }
 
 template <typename ELFShdrTy>
diff --git a/bolt/test/X86/bolt-address-translation.test b/bolt/test/X86/bolt-address-translation.test
index f68a8f7e9bcb7f..3c9e7bd9d37c20 100644
--- a/bolt/test/X86/bolt-address-translation.test
+++ b/bolt/test/X86/bolt-address-translation.test
@@ -36,7 +36,8 @@
 #
 # CHECK:      BOLT: 3 out of 7 functions were overwritten.
 # CHECK:      BOLT-INFO: Wrote 6 BAT maps
-# CHECK:      BOLT-INFO: Wrote 3 BAT cold-to-hot entries
+# CHECK:      BOLT-INFO: Wrote 3 function and 58 basic block hashes
+# CHECK:      BOLT-INFO: BAT section size (bytes): 816
 #
 # usqrt mappings (hot part). We match against any key (left side containing
 # the bolted binary offsets) because BOLT may change where it puts instructions
@@ -44,13 +45,13 @@
 # binary offsets (right side) should be the same because these addresses are
 # hardcoded in the blarge.yaml file.
 #
-# CHECK-BAT-DUMP:      Function Address: 0x401170
+# CHECK-BAT-DUMP:      Function Address: 0x401170, hash: ace6cbc638b31983
 # CHECK-BAT-DUMP-NEXT: BB mappings:
-# CHECK-BAT-DUMP-NEXT: 0x0 -> 0x0
+# CHECK-BAT-DUMP-NEXT: 0x0 -> 0x0 hash: 36007ba1d80c0000
 # CHECK-BAT-DUMP-NEXT: 0x8 -> 0x8 (branch)
-# CHECK-BAT-DUMP-NEXT: 0x{{.*}} -> 0x39
+# CHECK-BAT-DUMP-NEXT: 0x{{.*}} -> 0x39 hash: 5c06705524800039
 # CHECK-BAT-DUMP-NEXT: 0x{{.*}} -> 0x3d (branch)
-# CHECK-BAT-DUMP-NEXT: 0x{{.*}} -> 0x10
+# CHECK-BAT-DUMP-NEXT: 0x{{.*}} -> 0x10 hash: d70d7a64320e0010
 # CHECK-BAT-DUMP-NEXT: 0x{{.*}} -> 0x30 (branch)
 #
 # CHECK-BAT-DUMP: 3 cold mappings

>From 914b746d7e1db42b7aa59e7b47c13c6717e86b21 Mon Sep 17 00:00:00 2001
From: Amir Ayupov <aaupov at fb.com>
Date: Thu, 4 Jan 2024 07:24:12 -0800
Subject: [PATCH 2/2] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20ch?=
 =?UTF-8?q?anges=20introduced=20through=20rebase?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Created using spr 1.3.4

[skip ci]
---
 bolt/docs/BAT.md                            | 10 ++++++----
 bolt/lib/Profile/BoltAddressTranslation.cpp |  2 +-
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/bolt/docs/BAT.md b/bolt/docs/BAT.md
index fcb3e70c9e369c..b64bebd4819927 100644
--- a/bolt/docs/BAT.md
+++ b/bolt/docs/BAT.md
@@ -12,7 +12,7 @@ information. This information enables mapping the profile back from optimized
 binary onto the original binary.
 
 # Usage
-`--enable-bat` flag controls the generation of BAT section. Sampled profile 
+`--enable-bat` flag controls the generation of BAT section. Sampled profile
 needs to be passed along with the optimized binary containing BAT section to
 `perf2bolt` which reads BAT section and produces fdata profile for the original
 binary. Note that YAML profile generation is not supported since BAT doesn't
@@ -30,14 +30,15 @@ BAT section is created from `BoltAddressTranslation` class which captures
 address translation information provided by BOLT linker. It is then encoded as a
 note section in the output binary.
 
-During profile conversion when BAT-enabled binary is passed to perf2bolt, 
+During profile conversion when BAT-enabled binary is passed to perf2bolt,
 `BoltAddressTranslation` class is populated from BAT section. The class is then
 queried by `DataAggregator` during sample processing to reconstruct addresses/
 offsets in the input binary.
 
 ## Encoding format
-The encoding is specified in bolt/include/bolt/Profile/BoltAddressTranslation.h
-and bolt/lib/Profile/BoltAddressTranslation.cpp.
+The encoding is specified in
+[BoltAddressTranslation.h](/bolt/include/bolt/Profile/BoltAddressTranslation.h)
+and [BoltAddressTranslation.cpp](/bolt/lib/Profile/BoltAddressTranslation.cpp).
 
 ### Layout
 The general layout is as follows:
@@ -82,6 +83,7 @@ Hot indices are delta encoded, implicitly starting at zero.
 | `NumEntries` | ULEB128 | Number of address translation entries for a function |
 | `EqualElems` | ULEB128 | Hot functions only: number of equal offsets in the beginning of a function |
 | `BranchEntries` | Bitmask, `alignTo(EqualElems, 8)` bits | Hot functions only: if `EqualElems` is non-zero, bitmask denoting entries with `BRANCHENTRY` bit |
+
 Function header is followed by `EqualElems` offsets (hot functions only) and
 `NumEntries-EqualElems` (`NumEntries` for cold functions) pairs of offsets for
 current function.
diff --git a/bolt/lib/Profile/BoltAddressTranslation.cpp b/bolt/lib/Profile/BoltAddressTranslation.cpp
index 42031e1098e203..bd9ee8dce73fe1 100644
--- a/bolt/lib/Profile/BoltAddressTranslation.cpp
+++ b/bolt/lib/Profile/BoltAddressTranslation.cpp
@@ -143,7 +143,7 @@ APInt BoltAddressTranslation::calculateBranchEntriesBitMask(MapTy &Map,
   return BitMask;
 }
 
-size_t BoltAddressTranslation::getNumEqualOffsets(const MapTy& Map) const {
+size_t BoltAddressTranslation::getNumEqualOffsets(const MapTy &Map) const {
   size_t EqualOffsets = 0;
   for (const std::pair<const uint32_t, uint32_t> &KeyVal : Map) {
     const uint32_t OutputOffset = KeyVal.first;



More information about the llvm-commits mailing list