[llvm] [BOLT][NFC] Pass BF/BB hashes to BAT (PR #76906)
Amir Ayupov via llvm-commits
llvm-commits at lists.llvm.org
Thu Feb 15 12:49:22 PST 2024
https://github.com/aaupov updated https://github.com/llvm/llvm-project/pull/76906
>From 8cf8215c0173bea7908d0641ed3b4360e2756be1 Mon Sep 17 00:00:00 2001
From: Amir Ayupov <aaupov at fb.com>
Date: Wed, 3 Jan 2024 21:25:35 -0800
Subject: [PATCH 1/2] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20ch?=
=?UTF-8?q?anges=20to=20main=20this=20commit=20is=20based=20on?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Created using spr 1.3.4
[skip ci]
---
bolt/docs/BAT.md | 100 ++++++++
.../bolt/Profile/BoltAddressTranslation.h | 23 +-
bolt/lib/Profile/BoltAddressTranslation.cpp | 223 +++++++++++++-----
bolt/lib/Rewrite/RewriteInstance.cpp | 1 +
bolt/test/X86/bolt-address-translation.test | 2 +-
5 files changed, 283 insertions(+), 66 deletions(-)
create mode 100644 bolt/docs/BAT.md
diff --git a/bolt/docs/BAT.md b/bolt/docs/BAT.md
new file mode 100644
index 00000000000000..d114865a612430
--- /dev/null
+++ b/bolt/docs/BAT.md
@@ -0,0 +1,100 @@
+# BOLT Address Translation (BAT)
+# Purpose
+A regular profile collection for BOLT involves collecting samples from
+unoptimized binary. BOLT Address Translation allows collecting profile
+from BOLT-optimized binary and using it for optimizing the input (pre-BOLT)
+binary.
+
+# Overview
+BOLT Address Translation is an extra section (`.note.bolt_bat`) inserted by BOLT
+into the output binary containing translation tables and split functions linkage
+information. This information enables mapping the profile back from optimized
+binary onto the original binary.
+
+# Usage
+`--enable-bat` flag controls the generation of BAT section. Sampled profile
+needs to be passed along with the optimized binary containing BAT section to
+`perf2bolt` which reads BAT section and produces fdata profile for the original
+binary. Note that YAML profile generation is not supported since BAT doesn't
+contain the metadata for input functions.
+
+# Internals
+## Section contents
+The section is organized as follows:
+- Hot functions table
+ - Address translation tables
+- Cold functions table
+
+## Construction and parsing
+BAT section is created from `BoltAddressTranslation` class which captures
+address translation information provided by BOLT linker. It is then encoded as a
+note section in the output binary.
+
+During profile conversion when BAT-enabled binary is passed to perf2bolt,
+`BoltAddressTranslation` class is populated from BAT section. The class is then
+queried by `DataAggregator` during sample processing to reconstruct addresses/
+offsets in the input binary.
+
+## Encoding format
+The encoding is specified in bolt/include/bolt/Profile/BoltAddressTranslation.h
+and bolt/lib/Profile/BoltAddressTranslation.cpp.
+
+### Layout
+The general layout is as follows:
+```
+Hot functions table header
+|------------------|
+| Function entry |
+| |--------------| |
+| | OutOff InOff | |
+| |--------------| |
+~~~~~~~~~~~~~~~~~~~~
+
+Cold functions table header
+|------------------|
+| Function entry |
+| |--------------| |
+| | OutOff InOff | |
+| |--------------| |
+~~~~~~~~~~~~~~~~~~~~
+```
+
+### Functions table
+Hot and cold functions tables share the encoding except differences marked below.
+Header:
+| Entry | Encoding | Description |
+| ------ | ----- | ----------- |
+| `NumFuncs` | ULEB128 | Number of functions in the functions table |
+
+The header is followed by Functions table with `NumFuncs` entries.
+Output binary addresses are delta encoded, meaning that only the difference with
+the last previous output address is stored. Addresses implicitly start at zero.
+Output addresses are continuous through function start addresses and function
+internal offsets, and between hot and cold fragments, to better spread deltas
+and save space.
+
+Hot indices are delta encoded, implicitly starting at zero.
+| Entry | Encoding | Description |
+| ------ | ------| ----------- |
+| `Address` | Continuous, Delta, ULEB128 | Function address in the output binary |
+| `HotIndex` | Delta, ULEB128 | Cold functions only: index of corresponding hot function in hot functions table |
+| `NumEntries` | ULEB128 | Number of address translation entries for a function |
+| `EqualElems` | ULEB128 | Hot functions only: number of equal offsets in the beginning of a function |
+| `BranchEntries` | Bitmask, `alignTo(EqualElems, 8)` bits | Hot functions only: if `EqualElems` is non-zero, bitmask denoting entries with `BRANCHENTRY` bit |
+Function header is followed by `EqualElems` offsets (hot functions only) and
+`NumEntries-EqualElems` (`NumEntries` for cold functions) pairs of offsets for
+current function.
+
+### Address translation table
+Delta encoding means that only the difference with the previous corresponding
+entry is encoded. Input offsets implicitly start at zero.
+| Entry | Encoding | Description |
+| ------ | ------| ----------- |
+| `OutputAddr` | Continuous, Delta, ULEB128 | Function offset in output binary |
+| `InputAddr` | Optional, Delta, SLEB128 | Function offset in input binary with `BRANCHENTRY` LSB bit |
+
+`BRANCHENTRY` bit denotes whether a given offset pair is a control flow source
+(branch or call instruction). If not set, it signifies a control flow target
+(basic block offset).
+`InputAddr` is omitted for equal offsets in input and output function. In this
+case, `BRANCHENTRY` bits are encoded separately in a `BranchEntries` bitvector.
diff --git a/bolt/include/bolt/Profile/BoltAddressTranslation.h b/bolt/include/bolt/Profile/BoltAddressTranslation.h
index 07e4b283211c69..18d273762351ef 100644
--- a/bolt/include/bolt/Profile/BoltAddressTranslation.h
+++ b/bolt/include/bolt/Profile/BoltAddressTranslation.h
@@ -11,6 +11,7 @@
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/DataExtractor.h"
#include <cstdint>
#include <map>
#include <optional>
@@ -78,10 +79,21 @@ class BoltAddressTranslation {
BoltAddressTranslation() {}
+ /// Write the serialized address translation table for a function.
+ template <bool Cold>
+ void writeMaps(std::map<uint64_t, MapTy> &Maps, uint64_t &PrevAddress,
+ raw_ostream &OS);
+
/// Write the serialized address translation tables for each reordered
/// function
void write(const BinaryContext &BC, raw_ostream &OS);
+ /// Read the serialized address translation table for a function.
+ /// Return a parse error if failed.
+ template <bool Cold>
+ void parseMaps(std::vector<uint64_t> &HotFuncs, uint64_t &PrevAddress,
+ DataExtractor &DE, uint64_t &Offset, Error &Err);
+
/// Read the serialized address translation tables and load them internally
/// in memory. Return a parse error if failed.
std::error_code parse(StringRef Buf);
@@ -118,14 +130,23 @@ class BoltAddressTranslation {
void writeEntriesForBB(MapTy &Map, const BinaryBasicBlock &BB,
uint64_t FuncAddress);
+ /// Returns the bitmask with set bits corresponding to indices of BRANCHENTRY
+ /// entries in function address translation map.
+ APInt calculateBranchEntriesBitMask(MapTy &Map, size_t EqualElems);
+
+ /// Calculate the number of equal offsets (output = input) in the beginning
+ /// of the function.
+ size_t getNumEqualOffsets(const MapTy &Map) const;
+
std::map<uint64_t, MapTy> Maps;
+ std::map<uint64_t, MapTy> ColdMaps;
/// Links outlined cold bocks to their original function
std::map<uint64_t, uint64_t> ColdPartSource;
/// Identifies the address of a control-flow changing instructions in a
/// translation map entry
- const static uint32_t BRANCHENTRY = 0x80000000;
+ const static uint32_t BRANCHENTRY = 0x1;
};
} // namespace bolt
diff --git a/bolt/lib/Profile/BoltAddressTranslation.cpp b/bolt/lib/Profile/BoltAddressTranslation.cpp
index e004309e0e2136..f28077afbc5959 100644
--- a/bolt/lib/Profile/BoltAddressTranslation.cpp
+++ b/bolt/lib/Profile/BoltAddressTranslation.cpp
@@ -8,8 +8,10 @@
#include "bolt/Profile/BoltAddressTranslation.h"
#include "bolt/Core/BinaryFunction.h"
-#include "llvm/Support/DataExtractor.h"
+#include "llvm/ADT/APInt.h"
#include "llvm/Support/Errc.h"
+#include "llvm/Support/Error.h"
+#include "llvm/Support/LEB128.h"
#define DEBUG_TYPE "bolt-bat"
@@ -44,7 +46,7 @@ void BoltAddressTranslation::writeEntriesForBB(MapTy &Map,
// and this deleted block will both share the same output address (the same
// key), and we need to map back. We choose here to privilege the successor by
// allowing it to overwrite the previously inserted key in the map.
- Map[BBOutputOffset] = BBInputOffset;
+ Map[BBOutputOffset] = BBInputOffset << 1;
const auto &IOAddressMap =
BB.getFunction()->getBinaryContext().getIOAddressMap();
@@ -61,8 +63,8 @@ void BoltAddressTranslation::writeEntriesForBB(MapTy &Map,
LLVM_DEBUG(dbgs() << " Key: " << Twine::utohexstr(OutputOffset) << " Val: "
<< Twine::utohexstr(InputOffset) << " (branch)\n");
- Map.insert(
- std::pair<uint32_t, uint32_t>(OutputOffset, InputOffset | BRANCHENTRY));
+ Map.insert(std::pair<uint32_t, uint32_t>(OutputOffset,
+ (InputOffset << 1) | BRANCHENTRY));
}
}
@@ -96,41 +98,102 @@ void BoltAddressTranslation::write(const BinaryContext &BC, raw_ostream &OS) {
for (const BinaryBasicBlock *const BB : FF)
writeEntriesForBB(Map, *BB, FF.getAddress());
- Maps.emplace(FF.getAddress(), std::move(Map));
+ ColdMaps.emplace(FF.getAddress(), std::move(Map));
ColdPartSource.emplace(FF.getAddress(), Function.getOutputAddress());
}
}
+ // Output addresses are delta-encoded
+ uint64_t PrevAddress = 0;
+ writeMaps</*Cold=*/false>(Maps, PrevAddress, OS);
+ writeMaps</*Cold=*/true>(ColdMaps, PrevAddress, OS);
+
+ outs() << "BOLT-INFO: Wrote " << Maps.size() + ColdMaps.size()
+ << " BAT maps\n";
+}
+
+APInt BoltAddressTranslation::calculateBranchEntriesBitMask(MapTy &Map,
+ size_t EqualElems) {
+ APInt BitMask(alignTo(EqualElems, 8), 0);
+ size_t Index = 0;
+ for (std::pair<const uint32_t, uint32_t> &KeyVal : Map) {
+ if (Index == EqualElems)
+ break;
+ const uint32_t OutputOffset = KeyVal.second;
+ if (OutputOffset & BRANCHENTRY)
+ BitMask.setBit(Index);
+ ++Index;
+ }
+ return BitMask;
+}
+
+size_t BoltAddressTranslation::getNumEqualOffsets(const MapTy& Map) const {
+ size_t EqualOffsets = 0;
+ for (const std::pair<const uint32_t, uint32_t> &KeyVal : Map) {
+ const uint32_t OutputOffset = KeyVal.first;
+ const uint32_t InputOffset = KeyVal.second >> 1;
+ if (OutputOffset == InputOffset)
+ ++EqualOffsets;
+ else
+ break;
+ }
+ return EqualOffsets;
+}
+
+template <bool Cold>
+void BoltAddressTranslation::writeMaps(std::map<uint64_t, MapTy> &Maps,
+ uint64_t &PrevAddress, raw_ostream &OS) {
const uint32_t NumFuncs = Maps.size();
- OS.write(reinterpret_cast<const char *>(&NumFuncs), 4);
- LLVM_DEBUG(dbgs() << "Writing " << NumFuncs << " functions for BAT.\n");
+ encodeULEB128(NumFuncs, OS);
+ LLVM_DEBUG(dbgs() << "Writing " << NumFuncs << (Cold ? " cold" : "")
+ << " functions for BAT.\n");
+ size_t PrevIndex = 0;
for (auto &MapEntry : Maps) {
const uint64_t Address = MapEntry.first;
MapTy &Map = MapEntry.second;
const uint32_t NumEntries = Map.size();
LLVM_DEBUG(dbgs() << "Writing " << NumEntries << " entries for 0x"
<< Twine::utohexstr(Address) << ".\n");
- OS.write(reinterpret_cast<const char *>(&Address), 8);
- OS.write(reinterpret_cast<const char *>(&NumEntries), 4);
+ encodeULEB128(Address - PrevAddress, OS);
+ PrevAddress = Address;
+ if (Cold) {
+ size_t HotIndex =
+ std::distance(ColdPartSource.begin(), ColdPartSource.find(Address));
+ encodeULEB128(HotIndex - PrevIndex, OS);
+ PrevIndex = HotIndex;
+ }
+ encodeULEB128(NumEntries, OS);
+ // For hot fragments only: encode the number of equal offsets
+ // (output = input) in the beginning of the function. Only encode one offset
+ // in these cases.
+ const size_t EqualElems = Cold ? 0 : getNumEqualOffsets(Map);
+ if (!Cold) {
+ encodeULEB128(EqualElems, OS);
+ if (EqualElems) {
+ const size_t BranchEntriesBytes = alignTo(EqualElems, 8) / 8;
+ APInt BranchEntries = calculateBranchEntriesBitMask(Map, EqualElems);
+ OS.write(reinterpret_cast<const char *>(BranchEntries.getRawData()),
+ BranchEntriesBytes);
+ LLVM_DEBUG({
+ dbgs() << "BranchEntries: ";
+ SmallString<8> BitMaskStr;
+ BranchEntries.toString(BitMaskStr, 2, false);
+ dbgs() << BitMaskStr << '\n';
+ });
+ }
+ }
+ size_t Index = 0;
+ uint64_t InOffset = 0;
+ // Output and Input addresses and delta-encoded
for (std::pair<const uint32_t, uint32_t> &KeyVal : Map) {
- OS.write(reinterpret_cast<const char *>(&KeyVal.first), 4);
- OS.write(reinterpret_cast<const char *>(&KeyVal.second), 4);
+ const uint64_t OutputAddress = KeyVal.first + Address;
+ encodeULEB128(OutputAddress - PrevAddress, OS);
+ PrevAddress = OutputAddress;
+ if (Index++ >= EqualElems)
+ encodeSLEB128(KeyVal.second - InOffset, OS);
+ InOffset = KeyVal.second; // Keeping InOffset as if BRANCHENTRY is encoded
}
}
- const uint32_t NumColdEntries = ColdPartSource.size();
- LLVM_DEBUG(dbgs() << "Writing " << NumColdEntries
- << " cold part mappings.\n");
- OS.write(reinterpret_cast<const char *>(&NumColdEntries), 4);
- for (std::pair<const uint64_t, uint64_t> &ColdEntry : ColdPartSource) {
- OS.write(reinterpret_cast<const char *>(&ColdEntry.first), 8);
- OS.write(reinterpret_cast<const char *>(&ColdEntry.second), 8);
- LLVM_DEBUG(dbgs() << " " << Twine::utohexstr(ColdEntry.first) << " -> "
- << Twine::utohexstr(ColdEntry.second) << "\n");
- }
-
- outs() << "BOLT-INFO: Wrote " << Maps.size() << " BAT maps\n";
- outs() << "BOLT-INFO: Wrote " << NumColdEntries
- << " BAT cold-to-hot entries\n";
}
std::error_code BoltAddressTranslation::parse(StringRef Buf) {
@@ -152,53 +215,85 @@ std::error_code BoltAddressTranslation::parse(StringRef Buf) {
if (Name.substr(0, 4) != "BOLT")
return make_error_code(llvm::errc::io_error);
- if (Buf.size() - Offset < 4)
- return make_error_code(llvm::errc::io_error);
+ Error Err(Error::success());
+ std::vector<uint64_t> HotFuncs;
+ uint64_t PrevAddress = 0;
+ parseMaps</*Cold=*/false>(HotFuncs, PrevAddress, DE, Offset, Err);
+ parseMaps</*Cold=*/true>(HotFuncs, PrevAddress, DE, Offset, Err);
+ outs() << "BOLT-INFO: Parsed " << Maps.size() << " BAT entries\n";
+ return errorToErrorCode(std::move(Err));
+}
- const uint32_t NumFunctions = DE.getU32(&Offset);
- LLVM_DEBUG(dbgs() << "Parsing " << NumFunctions << " functions\n");
+template <bool Cold>
+void BoltAddressTranslation::parseMaps(std::vector<uint64_t> &HotFuncs,
+ uint64_t &PrevAddress, DataExtractor &DE,
+ uint64_t &Offset, Error &Err) {
+ const uint32_t NumFunctions = DE.getULEB128(&Offset, &Err);
+ LLVM_DEBUG(dbgs() << "Parsing " << NumFunctions << (Cold ? " cold" : "")
+ << " functions\n");
+ size_t HotIndex = 0;
for (uint32_t I = 0; I < NumFunctions; ++I) {
- if (Buf.size() - Offset < 12)
- return make_error_code(llvm::errc::io_error);
-
- const uint64_t Address = DE.getU64(&Offset);
- const uint32_t NumEntries = DE.getU32(&Offset);
+ const uint64_t Address = PrevAddress + DE.getULEB128(&Offset, &Err);
+ PrevAddress = Address;
+ if (Cold) {
+ HotIndex += DE.getULEB128(&Offset, &Err);
+ ColdPartSource.emplace(Address, HotFuncs[HotIndex]);
+ } else {
+ HotFuncs.push_back(Address);
+ }
+ const uint32_t NumEntries = DE.getULEB128(&Offset, &Err);
+ // Equal offsets, hot fragments only.
+ size_t EqualElems = 0;
+ APInt *BEBitMask(nullptr);
+ if (!Cold) {
+ EqualElems = DE.getULEB128(&Offset, &Err);
+ LLVM_DEBUG(dbgs() << formatv("Equal offsets: {0}, {1} bytes\n",
+ EqualElems, getULEB128Size(EqualElems)));
+ if (EqualElems) {
+ const size_t BranchEntriesBytes = alignTo(EqualElems, 8) / 8;
+ BEBitMask = new APInt(alignTo(EqualElems, 8), 0);
+ LoadIntFromMemory(
+ *BEBitMask,
+ reinterpret_cast<const uint8_t *>(
+ DE.getBytes(&Offset, BranchEntriesBytes, &Err).data()),
+ BranchEntriesBytes);
+ LLVM_DEBUG({
+ dbgs() << "BEBitMask: ";
+ SmallString<8> BitMaskStr;
+ BEBitMask->toString(BitMaskStr, 2, false);
+ dbgs() << BitMaskStr << ", " << BranchEntriesBytes << " bytes\n";
+ });
+ }
+ }
MapTy Map;
LLVM_DEBUG(dbgs() << "Parsing " << NumEntries << " entries for 0x"
<< Twine::utohexstr(Address) << "\n");
- if (Buf.size() - Offset < 8 * NumEntries)
- return make_error_code(llvm::errc::io_error);
+ uint64_t InputOffset = 0;
for (uint32_t J = 0; J < NumEntries; ++J) {
- const uint32_t OutputAddr = DE.getU32(&Offset);
- const uint32_t InputAddr = DE.getU32(&Offset);
- Map.insert(std::pair<uint32_t, uint32_t>(OutputAddr, InputAddr));
- LLVM_DEBUG(dbgs() << Twine::utohexstr(OutputAddr) << " -> "
- << Twine::utohexstr(InputAddr) << "\n");
+ const uint64_t OutputDelta = DE.getULEB128(&Offset, &Err);
+ const uint64_t OutputAddress = PrevAddress + OutputDelta;
+ const uint64_t OutputOffset = OutputAddress - Address;
+ PrevAddress = OutputAddress;
+ int64_t InputDelta = 0;
+ if (J < EqualElems) {
+ InputOffset = (OutputOffset << 1) | (*BEBitMask)[J];
+ } else {
+ InputDelta = DE.getSLEB128(&Offset, &Err);
+ InputOffset += InputDelta;
+ }
+ Map.insert(std::pair<uint32_t, uint32_t>(OutputOffset, InputOffset));
+ LLVM_DEBUG(
+ dbgs() << formatv("{0:x} -> {1:x} ({2}/{3}b -> {4}/{5}b), {6:x}\n",
+ OutputOffset, InputOffset, OutputDelta,
+ getULEB128Size(OutputDelta), InputDelta,
+ (J < EqualElems) ? 0 : getSLEB128Size(InputDelta),
+ OutputAddress));
}
+ if (BEBitMask)
+ delete BEBitMask;
Maps.insert(std::pair<uint64_t, MapTy>(Address, Map));
}
-
- if (Buf.size() - Offset < 4)
- return make_error_code(llvm::errc::io_error);
-
- const uint32_t NumColdEntries = DE.getU32(&Offset);
- LLVM_DEBUG(dbgs() << "Parsing " << NumColdEntries << " cold part mappings\n");
- for (uint32_t I = 0; I < NumColdEntries; ++I) {
- if (Buf.size() - Offset < 16)
- return make_error_code(llvm::errc::io_error);
- const uint32_t ColdAddress = DE.getU64(&Offset);
- const uint32_t HotAddress = DE.getU64(&Offset);
- ColdPartSource.insert(
- std::pair<uint64_t, uint64_t>(ColdAddress, HotAddress));
- LLVM_DEBUG(dbgs() << Twine::utohexstr(ColdAddress) << " -> "
- << Twine::utohexstr(HotAddress) << "\n");
- }
- outs() << "BOLT-INFO: Parsed " << Maps.size() << " BAT entries\n";
- outs() << "BOLT-INFO: Parsed " << NumColdEntries
- << " BAT cold-to-hot entries\n";
-
- return std::error_code();
}
void BoltAddressTranslation::dump(raw_ostream &OS) {
@@ -209,7 +304,7 @@ void BoltAddressTranslation::dump(raw_ostream &OS) {
OS << "BB mappings:\n";
for (const auto &Entry : MapEntry.second) {
const bool IsBranch = Entry.second & BRANCHENTRY;
- const uint32_t Val = Entry.second & ~BRANCHENTRY;
+ const uint32_t Val = Entry.second >> 1; // dropping BRANCHENTRY bit
OS << "0x" << Twine::utohexstr(Entry.first) << " -> "
<< "0x" << Twine::utohexstr(Val);
if (IsBranch)
@@ -244,7 +339,7 @@ uint64_t BoltAddressTranslation::translate(uint64_t FuncAddress,
--KeyVal;
- const uint32_t Val = KeyVal->second & ~BRANCHENTRY;
+ const uint32_t Val = KeyVal->second >> 1; // dropping BRANCHENTRY bit
// Branch source addresses are translated to the first instruction of the
// source BB to avoid accounting for modifications BOLT may have made in the
// BB regarding deletion/addition of instructions.
diff --git a/bolt/lib/Rewrite/RewriteInstance.cpp b/bolt/lib/Rewrite/RewriteInstance.cpp
index a95b1650753cfd..f5a8a5b7168745 100644
--- a/bolt/lib/Rewrite/RewriteInstance.cpp
+++ b/bolt/lib/Rewrite/RewriteInstance.cpp
@@ -4112,6 +4112,7 @@ void RewriteInstance::encodeBATSection() {
copyByteArray(BoltInfo), BoltInfo.size(),
/*Alignment=*/1,
/*IsReadOnly=*/true, ELF::SHT_NOTE);
+ outs() << "BOLT-INFO: BAT section size (bytes): " << BoltInfo.size() << '\n';
}
template <typename ELFShdrTy>
diff --git a/bolt/test/X86/bolt-address-translation.test b/bolt/test/X86/bolt-address-translation.test
index f68a8f7e9bcb7f..f2020af2edebde 100644
--- a/bolt/test/X86/bolt-address-translation.test
+++ b/bolt/test/X86/bolt-address-translation.test
@@ -36,7 +36,7 @@
#
# CHECK: BOLT: 3 out of 7 functions were overwritten.
# CHECK: BOLT-INFO: Wrote 6 BAT maps
-# CHECK: BOLT-INFO: Wrote 3 BAT cold-to-hot entries
+# CHECK: BOLT-INFO: BAT section size (bytes): 336
#
# usqrt mappings (hot part). We match against any key (left side containing
# the bolted binary offsets) because BOLT may change where it puts instructions
>From ea4e19019632ec17e509928dbfb155d82fc5e1ea Mon Sep 17 00:00:00 2001
From: Amir Ayupov <aaupov at fb.com>
Date: Thu, 4 Jan 2024 07:23:58 -0800
Subject: [PATCH 2/2] =?UTF-8?q?[=F0=9D=98=80=F0=9D=97=BD=F0=9D=97=BF]=20ch?=
=?UTF-8?q?anges=20introduced=20through=20rebase?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Created using spr 1.3.4
[skip ci]
---
bolt/docs/BAT.md | 10 ++++++----
bolt/lib/Profile/BoltAddressTranslation.cpp | 2 +-
2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/bolt/docs/BAT.md b/bolt/docs/BAT.md
index d114865a612430..83c3d97bd44f11 100644
--- a/bolt/docs/BAT.md
+++ b/bolt/docs/BAT.md
@@ -12,7 +12,7 @@ information. This information enables mapping the profile back from optimized
binary onto the original binary.
# Usage
-`--enable-bat` flag controls the generation of BAT section. Sampled profile
+`--enable-bat` flag controls the generation of BAT section. Sampled profile
needs to be passed along with the optimized binary containing BAT section to
`perf2bolt` which reads BAT section and produces fdata profile for the original
binary. Note that YAML profile generation is not supported since BAT doesn't
@@ -30,14 +30,15 @@ BAT section is created from `BoltAddressTranslation` class which captures
address translation information provided by BOLT linker. It is then encoded as a
note section in the output binary.
-During profile conversion when BAT-enabled binary is passed to perf2bolt,
+During profile conversion when BAT-enabled binary is passed to perf2bolt,
`BoltAddressTranslation` class is populated from BAT section. The class is then
queried by `DataAggregator` during sample processing to reconstruct addresses/
offsets in the input binary.
## Encoding format
-The encoding is specified in bolt/include/bolt/Profile/BoltAddressTranslation.h
-and bolt/lib/Profile/BoltAddressTranslation.cpp.
+The encoding is specified in
+[BoltAddressTranslation.h](/bolt/include/bolt/Profile/BoltAddressTranslation.h)
+and [BoltAddressTranslation.cpp](/bolt/lib/Profile/BoltAddressTranslation.cpp).
### Layout
The general layout is as follows:
@@ -81,6 +82,7 @@ Hot indices are delta encoded, implicitly starting at zero.
| `NumEntries` | ULEB128 | Number of address translation entries for a function |
| `EqualElems` | ULEB128 | Hot functions only: number of equal offsets in the beginning of a function |
| `BranchEntries` | Bitmask, `alignTo(EqualElems, 8)` bits | Hot functions only: if `EqualElems` is non-zero, bitmask denoting entries with `BRANCHENTRY` bit |
+
Function header is followed by `EqualElems` offsets (hot functions only) and
`NumEntries-EqualElems` (`NumEntries` for cold functions) pairs of offsets for
current function.
diff --git a/bolt/lib/Profile/BoltAddressTranslation.cpp b/bolt/lib/Profile/BoltAddressTranslation.cpp
index f28077afbc5959..a3c8f318f845e5 100644
--- a/bolt/lib/Profile/BoltAddressTranslation.cpp
+++ b/bolt/lib/Profile/BoltAddressTranslation.cpp
@@ -127,7 +127,7 @@ APInt BoltAddressTranslation::calculateBranchEntriesBitMask(MapTy &Map,
return BitMask;
}
-size_t BoltAddressTranslation::getNumEqualOffsets(const MapTy& Map) const {
+size_t BoltAddressTranslation::getNumEqualOffsets(const MapTy &Map) const {
size_t EqualOffsets = 0;
for (const std::pair<const uint32_t, uint32_t> &KeyVal : Map) {
const uint32_t OutputOffset = KeyVal.first;
More information about the llvm-commits
mailing list