[llvm] Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening" (#162353) (PR #162435)
via llvm-commits
llvm-commits at lists.llvm.org
Wed Oct 8 00:46:03 PDT 2025
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-bolt
Author: Gergely Bálint (bgergely0)
<details>
<summary>Changes</summary>
Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening (#<!-- -->120064)" (#<!-- -->162353)
This reverts commit c7d776b06897567e2d698e447d80279664b67d47.
#<!-- -->120064 was reverted for breaking builders.
Fix: changed the mismatched type in MarkRAStates.cpp to `auto`.
---
Patch is 58.57 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/162435.diff
25 Files Affected:
- (added) bolt/docs/PacRetDesign.md (+228)
- (modified) bolt/include/bolt/Core/BinaryFunction.h (+56)
- (modified) bolt/include/bolt/Core/MCPlus.h (+6-1)
- (modified) bolt/include/bolt/Core/MCPlusBuilder.h (+62)
- (added) bolt/include/bolt/Passes/InsertNegateRAStatePass.h (+46)
- (added) bolt/include/bolt/Passes/MarkRAStates.h (+33)
- (modified) bolt/include/bolt/Utils/CommandLineOpts.h (+1)
- (modified) bolt/lib/Core/BinaryBasicBlock.cpp (+5-1)
- (modified) bolt/lib/Core/BinaryContext.cpp (+3)
- (modified) bolt/lib/Core/BinaryFunction.cpp (+4-21)
- (modified) bolt/lib/Core/Exceptions.cpp (+32-4)
- (modified) bolt/lib/Core/MCPlusBuilder.cpp (+49)
- (modified) bolt/lib/Passes/CMakeLists.txt (+2)
- (added) bolt/lib/Passes/InsertNegateRAStatePass.cpp (+142)
- (added) bolt/lib/Passes/MarkRAStates.cpp (+152)
- (modified) bolt/lib/Rewrite/BinaryPassManager.cpp (+13)
- (modified) bolt/lib/Rewrite/RewriteInstance.cpp (+11)
- (modified) bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp (+22)
- (added) bolt/test/AArch64/negate-ra-state-disallow.s (+25)
- (added) bolt/test/AArch64/negate-ra-state-incorrect.s (+78)
- (added) bolt/test/AArch64/negate-ra-state-reorder.s (+73)
- (added) bolt/test/AArch64/negate-ra-state.s (+76)
- (added) bolt/test/AArch64/pacret-split-funcs.s (+54)
- (added) bolt/test/runtime/AArch64/negate-ra-state.cpp (+26)
- (added) bolt/test/runtime/AArch64/pacret-function-split.cpp (+42)
``````````diff
diff --git a/bolt/docs/PacRetDesign.md b/bolt/docs/PacRetDesign.md
new file mode 100644
index 0000000000000..f3fe5fbd522cb
--- /dev/null
+++ b/bolt/docs/PacRetDesign.md
@@ -0,0 +1,228 @@
+# Optimizing binaries with pac-ret hardening
+
+This is a design document about processing the `DW_CFA_AARCH64_negate_ra_state`
+DWARF instruction in BOLT. As it describes internal design decisions, the
+intended audience is BOLT developers. The document is an updated version of the
+[RFC posted on the LLVM Discourse](https://discourse.llvm.org/t/rfc-bolt-aarch64-handle-opnegaterastate-to-enable-optimizing-binaries-with-pac-ret-hardening/86594).
+
+
+`DW_CFA_AARCH64_negate_ra_state` is also referred to as `.cfi_negate_ra_state`
+in assembly, or `OpNegateRAState` in BOLT sources. In this document, I will use
+**negate-ra-state** as a shorthand.
+
+## Introduction
+
+### Pointer Authentication
+
+For more information, see the [pac-ret section of the BOLT-binary-analysis document](BinaryAnalysis.md#pac-ret-analysis).
+
+### DW_CFA_AARCH64_negate_ra_state
+
+The negate-ra-state CFI is a vendor-specific Call Frame Instruction defined in
+the [Arm ABI](https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst#id1).
+
+```
+The DW_CFA_AARCH64_negate_ra_state operation negates bit[0] of the RA_SIGN_STATE pseudo-register.
+```
+
+This bit indicates to the unwinder whether the current return address is signed
+or not (hence the name). The unwinder uses this information to authenticate the
+pointer, and remove the Pointer Authentication Code (PAC) bits.
+Incorrect placement of negate-ra-state CFIs causes the unwinder to either attempt
+to authenticate an unsigned pointer (resulting in a segmentation fault), or skip
+authentication on a signed pointer, which can also cause a fault.
+
+Note: some unwinders use the `xpac` instruction to strip the PAC bits without
+authenticating the pointer. This is an incorrect (incomplete) implementation,
+as it allows control-flow modification in the case of unwinding.
+
+There are no DWARF instructions to directly set or clear the RA State. However,
+two other CFIs can also affect the RA state:
+- `DW_CFA_remember_state`: this CFI stores register rules onto an implicit stack.
+- `DW_CFA_restore_state`: this CFI pops rules from this stack.
+
+Example:
+
+| CFI | Effect on RA state |
+| ------------------------------ | ------------------------------ |
+| (default) | 0 |
+| DW_CFA_AARCH64_negate_ra_state | 0 -> 1 |
+| DW_CFA_remember_state | 1 pushed to the stack |
+| DW_CFA_AARCH64_negate_ra_state | 1 -> 0 |
+| DW_CFA_restore_state | 0 -> 1 (popped from the stack) |
+
+The Arm ABI also defines the DW_CFA_AARCH64_negate_ra_state_with_pc CFI, but it
+is not widely used, and is [likely to become deprecated](https://github.com/ARM-software/abi-aa/issues/327).
+
+### Where are these CFIs needed?
+
+Whenever two consecutive instructions have different RA states, the unwinder must
+be informed of the change. This typically occurs during pointer signing or
+authentication. If adjacent instructions differ in RA state but neither signs
+nor authenticates the return address, they must belong to different control flow
+paths. One is part of an execution path with signed RA, the other is part of a
+path with an unsigned RA.
+
+In the example below, the first BasicBlock ends in a conditional branch, and
+jumps to two different BasicBlocks, each with their own authentication, and
+return. The instructions on the border of the second and third BasicBlock have
+different RA states. The `ret` at the end of the second BasicBlock is in unsigned
+state. The start of the third BasicBlock is after the `paciasp` in the control
+flow, but before the authentication. In this case, a negate-ra-state is needed
+at the end of the second BasicBlock.
+
+```
+ +----------------+
+ | paciasp |
+ | |
+ | b.cc |
+ +--------+-------+
+ |
++----------------+
+| |
+| +--------v-------+
+| | |
+| | autiasp |
+| | ret | // RA: unsigned
+| +----------------+
++----------------+
+ |
+ +--------v-------+ // RA: signed
+ | |
+ | autiasp |
+ | ret |
+ +----------------+
+```
+
+> [!important]
+> The unwinder does not follow the control flow graph. It reads unwind
+> information in the layout order.
+
+Because these locations are dependent on how the function layout looks,
+negate-ra-state CFIs will become invalid during BasicBlock reordering.
+
+## Solution design
+
+The implementation introduces two new passes:
+1. `MarkRAStatesPass`: assigns the RA state to each instruction based on the CFIs
+ in the input binary
+2. `InsertNegateRAStatePass`: reads those assigned instruction RA states after
+ optimizations, and emits `DW_CFA_AARCH64_negate_ra_state` CFIs at the correct
+ places: wherever there is a state change between two consecutive instructions
+ in the layout order.
+
+To track metadata on individual instructions, the `MCAnnotation` class was
+extended. These also have helper functions in `MCPlusBuilder`.
+
+### Saving annotations at CFI reading
+
+CFIs are read and added to BinaryFunctions in `CFIReaderWriter::FillCFIInfoFor`.
+At this point, we add MCAnnotations about negate-ra-state, remember-state and
+restore-state CFIs to the instructions they refer to. This is to not interfere
+with the CFI processing that already happens in BOLT (e.g. remember-state and
+restore-state CFIs are removed in `normalizeCFIState` for reasons unrelated to PAC).
+
+As we add the MCAnnotations *to instructions*, we have to account for the case
+where the function starts with a CFI altering the RA state. As CFIs modify the RA
+state of the instructions before them, we cannot add the annotation to the first
+instruction.
+This special case is handled by adding an `initialRAState` bool to each BinaryFunction.
+If the `Offset` the CFI refers to is zero, we don't store an annotation, but set
+the `initialRAState` in `FillCFIInfoFor`. This information is then used in
+`MarkRAStates`.
+
+### Binaries without DWARF info
+
+In some cases, the DWARF tables are stripped from the binary. These programs
+usually have some other unwind-mechanism.
+These passes only run on functions that include at least one negate-ra-state CFI.
+This avoids processing functions that do not use Pointer Authentication, or on
+functions that use Pointer Authentication, but do not have DWARF info.
+
+In summary:
+- pointer auth is not used: no change, the new passes do not run.
+- pointer auth is used, but DWARF info is stripped: no change, the new passes
+ do not run.
+- pointer auth is used, and we have DWARF CFIs: passes run, and rewrite the
+ negate-ra-state CFI.
+
+### MarkRAStates pass
+
+This pass runs before optimizations reorder anything.
+
+It processes MCAnnotations generated during the CFI reading stage to check if
+instructions have either of the three CFIs that can modify RA state:
+- negate-ra-state,
+- remember-state,
+- restore-state.
+
+Then it adds new MCAnnotations to each instruction, indicating their RA state.
+Those annotations are:
+- Signed,
+- Unsigned.
+
+Below is a simple example, that shows the two different type of annotations:
+what we have before the pass, and after it.
+
+| Instruction | Before | After |
+| ----------------------------- | --------------- | -------- |
+| paciasp | negate-ra-state | unsigned |
+| stp x29, x30, [sp, #-0x10]! | | signed |
+| mov x29, sp | | signed |
+| ldp x29, x30, [sp], #0x10 | | signed |
+| autiasp | negate-ra-state | signed |
+| ret | | unsigned |
+
+##### Error handling in MarkRAState Pass:
+
+Whenever the MarkRAStates pass finds inconsistencies in the current
+BinaryFunction, it marks the function as ignored using `BF.setIgnored()`. BOLT
+will not optimize this function but will emit it unchanged in the original section
+(`.bolt.org.text`).
+
+The inconsistencies are as follows:
+- finding a `pac*` instruction when already in signed state
+- finding an `aut*` instruction when already in unsigned state
+- finding `pac*` and `aut*` instructions without `.cfi_negate_ra_state`.
+
+Users will be informed about the number of ignored functions in the pass, the
+exact functions ignored, and the found inconsistency.
+
+### InsertNegateRAStatePass
+
+This pass runs after optimizations. It performns the _inverse_ of MarkRAState pa s:
+1. it reads the RA state annotations attached to the instructions, and
+2. whenever the state changes, it adds a PseudoInstruction that holds an
+ OpNegateRAState CFI.
+
+##### Covering newly generated instructions:
+
+Some BOLT passes can add new Instructions. In InsertNegateRAStatePass, we have
+to know what RA state these have.
+
+The current solution has the `inferUnknownStates` function to cover these, using
+a fairly simple strategy: unknown states inherit the last known state.
+
+This will be updated to a more robust solution.
+
+> [!important]
+> As issue #160989 describes, unwind info is incorrect in stubs with multiple callers.
+> For this same reason, we cannot generate correct pac-specific unwind info: the signess
+> of the _incorrect_ return address is meaningless.
+
+### Optimizations requiring special attention
+
+Marking states before optimizations ensure that instructions can be moved around
+freely. The only special case is function splitting. When a function is split,
+the split part becomes a new function in the emitted binary. For unwinding to
+work, it needs to "replay" all CFIs that lead up to the split point. BOLT does
+this for other CFIs. As negate-ra-state is not read (only stored as an Annotation),
+we have to do this manually in InsertNegateRAStatePass. Here, if the split part
+starts with an instruction that has Signed RA state, we add a negate-ra-state CFI
+to indicate this.
+
+## Option to disallow the feature
+
+The feature can be guarded with the `--update-branch-prediction` flag, which is
+on by default. If the flag is set to false, and a function
+`containedNegateRAState()` after `FillCFIInfoFor()`, BOLT exits with an error.
diff --git a/bolt/include/bolt/Core/BinaryFunction.h b/bolt/include/bolt/Core/BinaryFunction.h
index 7e0e3bff83259..f5e9887b56f70 100644
--- a/bolt/include/bolt/Core/BinaryFunction.h
+++ b/bolt/include/bolt/Core/BinaryFunction.h
@@ -148,6 +148,11 @@ class BinaryFunction {
PF_MEMEVENT = 4, /// Profile has mem events.
};
+ void setContainedNegateRAState() { HadNegateRAState = true; }
+ bool containedNegateRAState() const { return HadNegateRAState; }
+ void setInitialRAState(bool State) { InitialRAState = State; }
+ bool getInitialRAState() { return InitialRAState; }
+
/// Struct for tracking exception handling ranges.
struct CallSite {
const MCSymbol *Start;
@@ -218,6 +223,12 @@ class BinaryFunction {
/// Current state of the function.
State CurrentState{State::Empty};
+ /// Indicates if the Function contained .cfi-negate-ra-state. These are not
+ /// read from the binary. This boolean is used when deciding to run the
+ /// .cfi-negate-ra-state rewriting passes on a function or not.
+ bool HadNegateRAState{false};
+ bool InitialRAState{false};
+
/// A list of symbols associated with the function entry point.
///
/// Multiple symbols would typically result from identical code-folding
@@ -1640,6 +1651,51 @@ class BinaryFunction {
void setHasInferredProfile(bool Inferred) { HasInferredProfile = Inferred; }
+ /// Find corrected offset the same way addCFIInstruction does it to skip NOPs.
+ std::optional<uint64_t> getCorrectedCFIOffset(uint64_t Offset) {
+ assert(!Instructions.empty());
+ auto I = Instructions.lower_bound(Offset);
+ if (Offset == getSize()) {
+ assert(I == Instructions.end() && "unexpected iterator value");
+ // Sometimes compiler issues restore_state after all instructions
+ // in the function (even after nop).
+ --I;
+ Offset = I->first;
+ }
+ assert(I->first == Offset && "CFI pointing to unknown instruction");
+ if (I == Instructions.begin())
+ return {};
+
+ --I;
+ while (I != Instructions.begin() && BC.MIB->isNoop(I->second)) {
+ Offset = I->first;
+ --I;
+ }
+ return Offset;
+ }
+
+ void setInstModifiesRAState(uint8_t CFIOpcode, uint64_t Offset) {
+ std::optional<uint64_t> CorrectedOffset = getCorrectedCFIOffset(Offset);
+ if (CorrectedOffset) {
+ auto I = Instructions.lower_bound(*CorrectedOffset);
+ I--;
+
+ switch (CFIOpcode) {
+ case dwarf::DW_CFA_AARCH64_negate_ra_state:
+ BC.MIB->setNegateRAState(I->second);
+ break;
+ case dwarf::DW_CFA_remember_state:
+ BC.MIB->setRememberState(I->second);
+ break;
+ case dwarf::DW_CFA_restore_state:
+ BC.MIB->setRestoreState(I->second);
+ break;
+ default:
+ assert(0 && "CFI Opcode not covered by function");
+ }
+ }
+ }
+
void addCFIInstruction(uint64_t Offset, MCCFIInstruction &&Inst) {
assert(!Instructions.empty());
diff --git a/bolt/include/bolt/Core/MCPlus.h b/bolt/include/bolt/Core/MCPlus.h
index 601d709712864..ead6ba1470da6 100644
--- a/bolt/include/bolt/Core/MCPlus.h
+++ b/bolt/include/bolt/Core/MCPlus.h
@@ -72,7 +72,12 @@ class MCAnnotation {
kLabel, /// MCSymbol pointing to this instruction.
kSize, /// Size of the instruction.
kDynamicBranch, /// Jit instruction patched at runtime.
- kGeneric /// First generic annotation.
+ kRASigned, /// Inst is in a range where RA is signed.
+ kRAUnsigned, /// Inst is in a range where RA is unsigned.
+ kRememberState, /// Inst has rememberState CFI.
+ kRestoreState, /// Inst has restoreState CFI.
+ kNegateState, /// Inst has OpNegateRAState CFI.
+ kGeneric, /// First generic annotation.
};
virtual void print(raw_ostream &OS) const = 0;
diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h
index 5b711b0e27bab..2772de73081d1 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -70,6 +70,20 @@ class MCPlusBuilder {
public:
using AllocatorIdTy = uint16_t;
+ std::optional<int64_t> getAnnotationAtOpIndex(const MCInst &Inst,
+ unsigned OpIndex) const {
+ std::optional<unsigned> FirstAnnotationOp = getFirstAnnotationOpIndex(Inst);
+ if (!FirstAnnotationOp)
+ return std::nullopt;
+
+ if (*FirstAnnotationOp > OpIndex || Inst.getNumOperands() < OpIndex)
+ return std::nullopt;
+
+ const auto *Op = Inst.begin() + OpIndex;
+ const int64_t ImmValue = Op->getImm();
+ return extractAnnotationIndex(ImmValue);
+ }
+
private:
/// A struct that represents a single annotation allocator
struct AnnotationAllocator {
@@ -603,6 +617,21 @@ class MCPlusBuilder {
return std::nullopt;
}
+ virtual bool isPSignOnLR(const MCInst &Inst) const {
+ llvm_unreachable("not implemented");
+ return false;
+ }
+
+ virtual bool isPAuthOnLR(const MCInst &Inst) const {
+ llvm_unreachable("not implemented");
+ return false;
+ }
+
+ virtual bool isPAuthAndRet(const MCInst &Inst) const {
+ llvm_unreachable("not implemented");
+ return false;
+ }
+
/// Returns the register used as a return address. Returns std::nullopt if
/// not applicable, such as reading the return address from a system register
/// or from the stack.
@@ -1314,6 +1343,39 @@ class MCPlusBuilder {
/// Return true if the instruction is a tail call.
bool isTailCall(const MCInst &Inst) const;
+ /// Stores NegateRAState annotation on \p Inst.
+ void setNegateRAState(MCInst &Inst) const;
+
+ /// Return true if \p Inst has NegateRAState annotation.
+ bool hasNegateRAState(const MCInst &Inst) const;
+
+ /// Sets RememberState annotation on \p Inst.
+ void setRememberState(MCInst &Inst) const;
+
+ /// Return true if \p Inst has RememberState annotation.
+ bool hasRememberState(const MCInst &Inst) const;
+
+ /// Stores RestoreState annotation on \p Inst.
+ void setRestoreState(MCInst &Inst) const;
+
+ /// Return true if \p Inst has RestoreState annotation.
+ bool hasRestoreState(const MCInst &Inst) const;
+
+ /// Stores RA Signed annotation on \p Inst.
+ void setRASigned(MCInst &Inst) const;
+
+ /// Return true if \p Inst has Signed RA annotation.
+ bool isRASigned(const MCInst &Inst) const;
+
+ /// Stores RA Unsigned annotation on \p Inst.
+ void setRAUnsigned(MCInst &Inst) const;
+
+ /// Return true if \p Inst has Unsigned RA annotation.
+ bool isRAUnsigned(const MCInst &Inst) const;
+
+ /// Return true if \p Inst doesn't have any annotation related to RA state.
+ bool isRAStateUnknown(const MCInst &Inst) const;
+
/// Return true if the instruction is a call with an exception handling info.
virtual bool isInvoke(const MCInst &Inst) const {
return isCall(Inst) && getEHInfo(Inst);
diff --git a/bolt/include/bolt/Passes/InsertNegateRAStatePass.h b/bolt/include/bolt/Passes/InsertNegateRAStatePass.h
new file mode 100644
index 0000000000000..836948bf5e9c0
--- /dev/null
+++ b/bolt/include/bolt/Passes/InsertNegateRAStatePass.h
@@ -0,0 +1,46 @@
+//===- bolt/Passes/InsertNegateRAStatePass.cpp ----------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements the InsertNegateRAStatePass class.
+//
+//===----------------------------------------------------------------------===//
+#ifndef BOLT_PASSES_INSERT_NEGATE_RA_STATE_PASS
+#define BOLT_PASSES_INSERT_NEGATE_RA_STATE_PASS
+
+#include "bolt/Passes/BinaryPasses.h"
+
+namespace llvm {
+namespace bolt {
+
+class InsertNegateRAState : public BinaryFunctionPass {
+public:
+ explicit InsertNegateRAState() : BinaryFunctionPass(false) {}
+
+ const char *getName() const override { return "insert-negate-ra-state-pass"; }
+
+ /// Pass entry point
+ Error runOnFunctions(BinaryContext &BC) override;
+ void runOnFunction(BinaryFunction &BF);
+
+private:
+ /// Because states are tracked as MCAnnotations on individual instructions,
+ /// newly inserted instructions do not have a state associated with them.
+ /// New states are "inherited" from the last known state.
+ void inferUnknownStates(BinaryFunction &BF);
+
+ /// Support for function splitting:
+ /// if two consecutive BBs with Signed state are going to end up in different
+ /// functions (so are held by different FunctionFragments), we have to add a
+ /// OpNegateRAState to the beginning of the newly split function, so it starts
+ /// with a Signed state.
+ void coverFunctionFragmentStart(BinaryFunction &BF, FunctionFragment &FF);
+};
+
+} // namespace bolt
+} // namespace llvm
+#endif
diff --git a/bolt/include/bolt/Passes/MarkRAStates.h b/bolt/include/bolt/Passes/MarkRAStates.h
new file mode 100644
index 0000000000000..675ab9727142b
--- /dev/null
+++ b/bolt/include/bolt/Passes/MarkRAStates.h
@@ -0,0 +1,33 @@
+//===- bolt/Passes/MarkRAStates.cpp ---------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements the MarkRAStates class.
+//
+//===----------------------------------------------------------------------===//
+#ifndef BOLT_PASSES_MARK_RA_STATES
+#define BOLT_PASSES_MARK_RA_STATES
+
+#include "bolt/Passes/BinaryPasses.h"
+
+namespace llvm {
+namespace bolt {
+
+class MarkRAStates ...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/162435
More information about the llvm-commits
mailing list