[llvm] [BOLT] Validate that direct branch/call targets are valid instructions (PR #165406)
via llvm-commits
llvm-commits at lists.llvm.org
Tue Oct 28 07:39:37 PDT 2025
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-bolt
Author: Jinjie Huang (Jinjie-Huang)
<details>
<summary>Changes</summary>
In some edge cases, a binary may contain direct `branch` or `call` instructions whose targets do not point to the beginning of a valid, executable instruction. This can occur due to compiler bugs, hand-written assembly, binary corruption, or when control flow targets a data section by mistake.
We also encountered the problems where "data in code" within OpenSSL's hand-written assembly was misidentified as instructions, the same as described in this [issue](https://github.com/llvm/llvm-project/issues/149382).
The problem occurred because a data sequence was incorrectly disassembled as a "jb" instruction. So, this patch tries to address this by validating the instruction at the target address for direct branches and calls. If the target instruction is invalid(implies a corrupted control flow), and the function will be set ignored.
---
Full diff: https://github.com/llvm/llvm-project/pull/165406.diff
2 Files Affected:
- (modified) bolt/include/bolt/Core/BinaryFunction.h (+7)
- (modified) bolt/lib/Core/BinaryFunction.cpp (+40)
``````````diff
diff --git a/bolt/include/bolt/Core/BinaryFunction.h b/bolt/include/bolt/Core/BinaryFunction.h
index b215a1558cbb4..6fdc8336bf1b9 100644
--- a/bolt/include/bolt/Core/BinaryFunction.h
+++ b/bolt/include/bolt/Core/BinaryFunction.h
@@ -2236,6 +2236,13 @@ class BinaryFunction {
/// it is probably another function.
bool isSymbolValidInScope(const SymbolRef &Symbol, uint64_t SymbolSize) const;
+ /// Validates if the target of a direct branch/call is a valid
+ /// executable instruction.
+ /// Return true if the target is valid, false otherwise.
+ bool validateBranchTarget(uint64_t TargetAddress,
+ uint64_t AbsoluteInstrAddr,
+ const ArrayRef<uint8_t> &CurrentFunctionData);
+
/// Disassemble function from raw data.
/// If successful, this function will populate the list of instructions
/// for this function together with offsets from the function start
diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp
index 84023efe1084e..821a302d14ba1 100644
--- a/bolt/lib/Core/BinaryFunction.cpp
+++ b/bolt/lib/Core/BinaryFunction.cpp
@@ -1283,6 +1283,41 @@ BinaryFunction::disassembleInstructionAtOffset(uint64_t Offset) const {
return std::nullopt;
}
+bool BinaryFunction::validateBranchTarget(
+ uint64_t TargetAddress, uint64_t AbsoluteInstrAddr,
+ const ArrayRef<uint8_t> &CurrentFunctionData) {
+ if (auto *TargetFunc =
+ BC.getBinaryFunctionContainingAddress(TargetAddress)) {
+ const uint64_t TargetOffset = TargetAddress - TargetFunc->getAddress();
+ ArrayRef<uint8_t> TargetFunctionData;
+ // Check if the target address is within the current function.
+ if (TargetFunc == this) {
+ TargetFunctionData = CurrentFunctionData;
+ } else {
+ // external call/branch, fetch the binary data for target
+ ErrorOr<ArrayRef<uint8_t>> TargetDataOrErr = TargetFunc->getData();
+ assert(TargetDataOrErr && "function data is not available");
+ TargetFunctionData = *TargetDataOrErr;
+ }
+
+ MCInst TargetInst;
+ uint64_t TargetInstSize;
+ if (!BC.SymbolicDisAsm->getInstruction(
+ TargetInst, TargetInstSize, TargetFunctionData.slice(TargetOffset),
+ TargetAddress, nulls())) {
+ // If the target address cannot be disassembled well,
+ // it implies a corrupted control flow.
+ BC.errs() << "BOLT-WARNING: direct branch/call at 0x"
+ << Twine::utohexstr(AbsoluteInstrAddr) << " in function "
+ << *this << " targets an invalid instruction at 0x"
+ << Twine::utohexstr(TargetAddress) << "\n";
+ return false;
+ }
+ }
+
+ return true;
+}
+
Error BinaryFunction::disassemble() {
NamedRegionTimer T("disassemble", "Disassemble function", "buildfuncs",
"Build Binary Functions", opts::TimeBuild);
@@ -1396,6 +1431,11 @@ Error BinaryFunction::disassemble() {
uint64_t TargetAddress = 0;
if (MIB->evaluateBranch(Instruction, AbsoluteInstrAddr, Size,
TargetAddress)) {
+ if (!validateBranchTarget(TargetAddress, AbsoluteInstrAddr,
+ FunctionData)) {
+ setIgnored();
+ break;
+ }
// Check if the target is within the same function. Otherwise it's
// a call, possibly a tail call.
//
``````````
</details>
https://github.com/llvm/llvm-project/pull/165406
More information about the llvm-commits
mailing list