[llvm] Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening" (#162353) (PR #162435)

Gergely Bálint via llvm-commits llvm-commits at lists.llvm.org
Wed Oct 8 00:42:37 PDT 2025


https://github.com/bgergely0 created https://github.com/llvm/llvm-project/pull/162435

Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening" (#162353)

This reverts commit c7d776b06897567e2d698e447d80279664b67d47.

[BOLT] Fix build failure

Changed the mismatched type to `auto`.

>From 3be96da4defd18dd463c75dcb4c812a18e1ee62d Mon Sep 17 00:00:00 2001
From: Gergely Balint <gergely.balint at arm.com>
Date: Wed, 8 Oct 2025 07:25:19 +0000
Subject: [PATCH 1/2] Reapply "[BOLT][AArch64] Handle OpNegateRAState to enable
 optimizing binaries with pac-ret hardening" (#162353)

This reverts commit c7d776b06897567e2d698e447d80279664b67d47.
---
 bolt/docs/PacRetDesign.md                     | 228 ++++++++++++++++++
 bolt/include/bolt/Core/BinaryFunction.h       |  56 +++++
 bolt/include/bolt/Core/MCPlus.h               |   7 +-
 bolt/include/bolt/Core/MCPlusBuilder.h        |  62 +++++
 .../bolt/Passes/InsertNegateRAStatePass.h     |  46 ++++
 bolt/include/bolt/Passes/MarkRAStates.h       |  33 +++
 bolt/include/bolt/Utils/CommandLineOpts.h     |   1 +
 bolt/lib/Core/BinaryBasicBlock.cpp            |   6 +-
 bolt/lib/Core/BinaryContext.cpp               |   3 +
 bolt/lib/Core/BinaryFunction.cpp              |  25 +-
 bolt/lib/Core/Exceptions.cpp                  |  36 ++-
 bolt/lib/Core/MCPlusBuilder.cpp               |  49 ++++
 bolt/lib/Passes/CMakeLists.txt                |   2 +
 bolt/lib/Passes/InsertNegateRAStatePass.cpp   | 142 +++++++++++
 bolt/lib/Passes/MarkRAStates.cpp              | 152 ++++++++++++
 bolt/lib/Rewrite/BinaryPassManager.cpp        |  13 +
 bolt/lib/Rewrite/RewriteInstance.cpp          |  11 +
 .../Target/AArch64/AArch64MCPlusBuilder.cpp   |  22 ++
 bolt/test/AArch64/negate-ra-state-disallow.s  |  25 ++
 bolt/test/AArch64/negate-ra-state-incorrect.s |  78 ++++++
 bolt/test/AArch64/negate-ra-state-reorder.s   |  73 ++++++
 bolt/test/AArch64/negate-ra-state.s           |  76 ++++++
 bolt/test/AArch64/pacret-split-funcs.s        |  54 +++++
 bolt/test/runtime/AArch64/negate-ra-state.cpp |  26 ++
 .../runtime/AArch64/pacret-function-split.cpp |  42 ++++
 25 files changed, 1241 insertions(+), 27 deletions(-)
 create mode 100644 bolt/docs/PacRetDesign.md
 create mode 100644 bolt/include/bolt/Passes/InsertNegateRAStatePass.h
 create mode 100644 bolt/include/bolt/Passes/MarkRAStates.h
 create mode 100644 bolt/lib/Passes/InsertNegateRAStatePass.cpp
 create mode 100644 bolt/lib/Passes/MarkRAStates.cpp
 create mode 100644 bolt/test/AArch64/negate-ra-state-disallow.s
 create mode 100644 bolt/test/AArch64/negate-ra-state-incorrect.s
 create mode 100644 bolt/test/AArch64/negate-ra-state-reorder.s
 create mode 100644 bolt/test/AArch64/negate-ra-state.s
 create mode 100644 bolt/test/AArch64/pacret-split-funcs.s
 create mode 100644 bolt/test/runtime/AArch64/negate-ra-state.cpp
 create mode 100644 bolt/test/runtime/AArch64/pacret-function-split.cpp

diff --git a/bolt/docs/PacRetDesign.md b/bolt/docs/PacRetDesign.md
new file mode 100644
index 0000000000000..f3fe5fbd522cb
--- /dev/null
+++ b/bolt/docs/PacRetDesign.md
@@ -0,0 +1,228 @@
+# Optimizing binaries with pac-ret hardening
+
+This is a design document about processing the `DW_CFA_AARCH64_negate_ra_state`
+DWARF instruction in BOLT. As it describes internal design decisions, the
+intended audience is BOLT developers. The document is an updated version of the
+[RFC posted on the LLVM Discourse](https://discourse.llvm.org/t/rfc-bolt-aarch64-handle-opnegaterastate-to-enable-optimizing-binaries-with-pac-ret-hardening/86594).
+
+
+`DW_CFA_AARCH64_negate_ra_state` is also referred to as  `.cfi_negate_ra_state`
+in assembly, or `OpNegateRAState` in BOLT sources. In this document, I will use
+**negate-ra-state** as a shorthand.
+
+## Introduction
+
+### Pointer Authentication
+
+For more information, see the [pac-ret section of the BOLT-binary-analysis document](BinaryAnalysis.md#pac-ret-analysis).
+
+### DW_CFA_AARCH64_negate_ra_state
+
+The negate-ra-state CFI is a vendor-specific Call Frame Instruction defined in
+the [Arm ABI](https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst#id1).
+
+```
+The DW_CFA_AARCH64_negate_ra_state operation negates bit[0] of the RA_SIGN_STATE pseudo-register.
+```
+
+This bit indicates to the unwinder whether the current return address is signed
+or not (hence the name). The unwinder uses this information to authenticate the
+pointer, and remove the Pointer Authentication Code (PAC) bits.
+Incorrect placement of negate-ra-state CFIs causes the unwinder to either attempt
+to authenticate an unsigned pointer (resulting in a segmentation fault), or skip
+authentication on a signed pointer, which can also cause a fault.
+
+Note: some unwinders use the `xpac` instruction to strip the PAC bits without
+authenticating the pointer. This is an incorrect (incomplete) implementation,
+as it allows control-flow modification in the case of unwinding.
+
+There are no DWARF instructions to directly set or clear the RA State. However,
+two other CFIs can also affect the RA state:
+- `DW_CFA_remember_state`: this CFI stores register rules onto an implicit stack.
+- `DW_CFA_restore_state`:  this CFI pops rules from this stack.
+
+Example:
+
+| CFI                            | Effect on RA state             |
+| ------------------------------ | ------------------------------ |
+| (default)                      | 0                              |
+| DW_CFA_AARCH64_negate_ra_state | 0 -> 1                         |
+| DW_CFA_remember_state          | 1 pushed to the stack          |
+| DW_CFA_AARCH64_negate_ra_state | 1 -> 0                         |
+| DW_CFA_restore_state           | 0 -> 1 (popped from the stack) |
+
+The Arm ABI also defines the DW_CFA_AARCH64_negate_ra_state_with_pc CFI, but it
+is not widely used, and is [likely to become deprecated](https://github.com/ARM-software/abi-aa/issues/327).
+
+### Where are these CFIs needed?
+
+Whenever two consecutive instructions have different RA states, the unwinder must
+be informed of the change. This typically occurs during pointer signing or
+authentication. If adjacent instructions differ in RA state but neither signs
+nor authenticates the return address, they must belong to different control flow
+paths. One is part of an execution path with signed RA, the other is part of a
+path with an unsigned RA.
+
+In the example below, the first BasicBlock ends in a conditional branch, and
+jumps to two different BasicBlocks, each with their own authentication, and
+return. The instructions on the border of the second and third BasicBlock have
+different RA states. The `ret` at the end of the second BasicBlock is in unsigned
+state. The start of the third BasicBlock is after the `paciasp` in the control
+flow, but before the authentication. In this case, a negate-ra-state is needed
+at the end of the second BasicBlock.
+
+```
+        +----------------+
+        |     paciasp    |
+        |                |
+        |      b.cc      |
+        +--------+-------+
+                 |
++----------------+
+|                |
+|       +--------v-------+
+|       |                |
+|       |    autiasp     |
+|       |      ret       |   // RA: unsigned
+|       +----------------+
++----------------+
+                 |
+        +--------v-------+  // RA: signed
+        |                |
+        |     autiasp    |
+        |      ret       |
+        +----------------+
+```
+
+> [!important]
+> The unwinder does not follow the control flow graph. It reads unwind
+> information in the layout order.
+
+Because these locations are dependent on how the function layout looks,
+negate-ra-state CFIs will become invalid during BasicBlock reordering.
+
+## Solution design
+
+The implementation introduces two new passes:
+1. `MarkRAStatesPass`: assigns the RA state to each instruction based on the CFIs
+    in the input binary
+2. `InsertNegateRAStatePass`: reads those assigned instruction RA states after
+    optimizations, and emits `DW_CFA_AARCH64_negate_ra_state` CFIs at the correct
+    places: wherever there is a state change between two consecutive instructions
+    in the layout order.
+
+To track metadata on individual instructions, the `MCAnnotation` class was
+extended. These also have helper functions in `MCPlusBuilder`.
+
+### Saving annotations at CFI reading
+
+CFIs are read and added to BinaryFunctions in `CFIReaderWriter::FillCFIInfoFor`.
+At this point, we add MCAnnotations about negate-ra-state, remember-state and
+restore-state CFIs to the instructions they refer to. This is to not interfere
+with the CFI processing that already happens in BOLT (e.g. remember-state and
+restore-state CFIs are removed in `normalizeCFIState` for reasons unrelated to PAC).
+
+As we add the MCAnnotations *to instructions*, we have to account for the case
+where the function starts with a CFI altering the RA state. As CFIs modify the RA
+state of the instructions before them, we cannot add the annotation to the first
+instruction.
+This special case is handled by adding an `initialRAState` bool to each BinaryFunction.
+If the `Offset` the CFI refers to is zero, we don't store an annotation, but set
+the `initialRAState` in `FillCFIInfoFor`. This information is then used in
+`MarkRAStates`.
+
+### Binaries without DWARF info
+
+In some cases, the DWARF tables are stripped from the binary. These programs
+usually have some other unwind-mechanism.
+These passes only run on functions that include at least one negate-ra-state CFI.
+This avoids processing functions that do not use Pointer Authentication, or on
+functions that use Pointer Authentication, but do not have DWARF info.
+
+In summary:
+- pointer auth is not used: no change, the new passes do not run.
+- pointer auth is used, but DWARF info is stripped: no change, the new passes
+  do not run.
+- pointer auth is used, and we have DWARF CFIs: passes run, and rewrite the
+  negate-ra-state CFI.
+
+### MarkRAStates pass
+
+This pass runs before optimizations reorder anything.
+
+It processes MCAnnotations generated during the CFI reading stage to check if
+instructions have either of the three CFIs that can modify RA state:
+- negate-ra-state,
+- remember-state,
+- restore-state.
+
+Then it adds new MCAnnotations to each instruction, indicating their RA state.
+Those annotations are:
+- Signed,
+- Unsigned.
+
+Below is a simple example, that shows the two different type of annotations:
+what we have before the pass, and after it.
+
+| Instruction                   | Before          |  After   |
+| ----------------------------- | --------------- | -------- |
+| paciasp                       | negate-ra-state | unsigned |
+| stp	x29, x30, [sp, #-0x10]! |                 | signed   |
+| mov	x29, sp                 |                 | signed   |
+| ldp	x29, x30, [sp], #0x10   |                 | signed   |
+| autiasp                       | negate-ra-state | signed   |
+| ret                           |                 | unsigned |
+
+##### Error handling in MarkRAState Pass:
+
+Whenever the MarkRAStates pass finds inconsistencies in the current
+BinaryFunction, it marks the function as ignored using `BF.setIgnored()`. BOLT
+will not optimize this function but will emit it unchanged in the original section
+(`.bolt.org.text`).
+
+The inconsistencies are as follows:
+- finding a `pac*` instruction when already in signed state
+- finding an `aut*` instruction when already in unsigned state
+- finding `pac*` and `aut*` instructions without `.cfi_negate_ra_state`.
+
+Users will be informed about the number of ignored functions in the pass, the
+exact functions ignored, and the found inconsistency.
+
+### InsertNegateRAStatePass
+
+This pass runs after optimizations. It performns the _inverse_ of MarkRAState pa s:
+1. it reads the RA state annotations attached to the instructions, and
+2. whenever the state changes, it adds a PseudoInstruction that holds an
+   OpNegateRAState CFI.
+
+##### Covering newly generated instructions:
+
+Some BOLT passes can add new Instructions. In InsertNegateRAStatePass, we have
+to know what RA state these have.
+
+The current solution has the `inferUnknownStates` function to cover these, using
+a fairly simple strategy: unknown states inherit the last known state.
+
+This will be updated to a more robust solution.
+
+> [!important]
+> As issue #160989 describes, unwind info is incorrect in stubs with multiple callers.
+> For this same reason, we cannot generate correct pac-specific unwind info: the signess
+> of the _incorrect_ return address is meaningless.
+
+### Optimizations requiring special attention
+
+Marking states before optimizations ensure that instructions can be moved around
+freely. The only special case is function splitting. When a function is split,
+the split part becomes a new function in the emitted binary. For unwinding to
+work, it needs to "replay" all CFIs that lead up to the split point. BOLT does
+this for other CFIs. As negate-ra-state is not read (only stored as an Annotation),
+we have to do this manually in InsertNegateRAStatePass. Here, if the split part
+starts with an instruction that has Signed RA state, we add a negate-ra-state CFI
+to indicate this.
+
+## Option to disallow the feature
+
+The feature can be guarded with the `--update-branch-prediction` flag, which is
+on by default. If the flag is set to false, and a function
+`containedNegateRAState()` after `FillCFIInfoFor()`, BOLT exits with an error.
diff --git a/bolt/include/bolt/Core/BinaryFunction.h b/bolt/include/bolt/Core/BinaryFunction.h
index 7e0e3bff83259..f5e9887b56f70 100644
--- a/bolt/include/bolt/Core/BinaryFunction.h
+++ b/bolt/include/bolt/Core/BinaryFunction.h
@@ -148,6 +148,11 @@ class BinaryFunction {
     PF_MEMEVENT = 4, /// Profile has mem events.
   };
 
+  void setContainedNegateRAState() { HadNegateRAState = true; }
+  bool containedNegateRAState() const { return HadNegateRAState; }
+  void setInitialRAState(bool State) { InitialRAState = State; }
+  bool getInitialRAState() { return InitialRAState; }
+
   /// Struct for tracking exception handling ranges.
   struct CallSite {
     const MCSymbol *Start;
@@ -218,6 +223,12 @@ class BinaryFunction {
   /// Current state of the function.
   State CurrentState{State::Empty};
 
+  /// Indicates if the Function contained .cfi-negate-ra-state. These are not
+  /// read from the binary. This boolean is used when deciding to run the
+  /// .cfi-negate-ra-state rewriting passes on a function or not.
+  bool HadNegateRAState{false};
+  bool InitialRAState{false};
+
   /// A list of symbols associated with the function entry point.
   ///
   /// Multiple symbols would typically result from identical code-folding
@@ -1640,6 +1651,51 @@ class BinaryFunction {
 
   void setHasInferredProfile(bool Inferred) { HasInferredProfile = Inferred; }
 
+  /// Find corrected offset the same way addCFIInstruction does it to skip NOPs.
+  std::optional<uint64_t> getCorrectedCFIOffset(uint64_t Offset) {
+    assert(!Instructions.empty());
+    auto I = Instructions.lower_bound(Offset);
+    if (Offset == getSize()) {
+      assert(I == Instructions.end() && "unexpected iterator value");
+      // Sometimes compiler issues restore_state after all instructions
+      // in the function (even after nop).
+      --I;
+      Offset = I->first;
+    }
+    assert(I->first == Offset && "CFI pointing to unknown instruction");
+    if (I == Instructions.begin())
+      return {};
+
+    --I;
+    while (I != Instructions.begin() && BC.MIB->isNoop(I->second)) {
+      Offset = I->first;
+      --I;
+    }
+    return Offset;
+  }
+
+  void setInstModifiesRAState(uint8_t CFIOpcode, uint64_t Offset) {
+    std::optional<uint64_t> CorrectedOffset = getCorrectedCFIOffset(Offset);
+    if (CorrectedOffset) {
+      auto I = Instructions.lower_bound(*CorrectedOffset);
+      I--;
+
+      switch (CFIOpcode) {
+      case dwarf::DW_CFA_AARCH64_negate_ra_state:
+        BC.MIB->setNegateRAState(I->second);
+        break;
+      case dwarf::DW_CFA_remember_state:
+        BC.MIB->setRememberState(I->second);
+        break;
+      case dwarf::DW_CFA_restore_state:
+        BC.MIB->setRestoreState(I->second);
+        break;
+      default:
+        assert(0 && "CFI Opcode not covered by function");
+      }
+    }
+  }
+
   void addCFIInstruction(uint64_t Offset, MCCFIInstruction &&Inst) {
     assert(!Instructions.empty());
 
diff --git a/bolt/include/bolt/Core/MCPlus.h b/bolt/include/bolt/Core/MCPlus.h
index 601d709712864..ead6ba1470da6 100644
--- a/bolt/include/bolt/Core/MCPlus.h
+++ b/bolt/include/bolt/Core/MCPlus.h
@@ -72,7 +72,12 @@ class MCAnnotation {
     kLabel,               /// MCSymbol pointing to this instruction.
     kSize,                /// Size of the instruction.
     kDynamicBranch,       /// Jit instruction patched at runtime.
-    kGeneric              /// First generic annotation.
+    kRASigned,            /// Inst is in a range where RA is signed.
+    kRAUnsigned,          /// Inst is in a range where RA is unsigned.
+    kRememberState,       /// Inst has rememberState CFI.
+    kRestoreState,        /// Inst has restoreState CFI.
+    kNegateState,         /// Inst has OpNegateRAState CFI.
+    kGeneric,             /// First generic annotation.
   };
 
   virtual void print(raw_ostream &OS) const = 0;
diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h
index 5b711b0e27bab..2772de73081d1 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -70,6 +70,20 @@ class MCPlusBuilder {
 public:
   using AllocatorIdTy = uint16_t;
 
+  std::optional<int64_t> getAnnotationAtOpIndex(const MCInst &Inst,
+                                                unsigned OpIndex) const {
+    std::optional<unsigned> FirstAnnotationOp = getFirstAnnotationOpIndex(Inst);
+    if (!FirstAnnotationOp)
+      return std::nullopt;
+
+    if (*FirstAnnotationOp > OpIndex || Inst.getNumOperands() < OpIndex)
+      return std::nullopt;
+
+    const auto *Op = Inst.begin() + OpIndex;
+    const int64_t ImmValue = Op->getImm();
+    return extractAnnotationIndex(ImmValue);
+  }
+
 private:
   /// A struct that represents a single annotation allocator
   struct AnnotationAllocator {
@@ -603,6 +617,21 @@ class MCPlusBuilder {
     return std::nullopt;
   }
 
+  virtual bool isPSignOnLR(const MCInst &Inst) const {
+    llvm_unreachable("not implemented");
+    return false;
+  }
+
+  virtual bool isPAuthOnLR(const MCInst &Inst) const {
+    llvm_unreachable("not implemented");
+    return false;
+  }
+
+  virtual bool isPAuthAndRet(const MCInst &Inst) const {
+    llvm_unreachable("not implemented");
+    return false;
+  }
+
   /// Returns the register used as a return address. Returns std::nullopt if
   /// not applicable, such as reading the return address from a system register
   /// or from the stack.
@@ -1314,6 +1343,39 @@ class MCPlusBuilder {
   /// Return true if the instruction is a tail call.
   bool isTailCall(const MCInst &Inst) const;
 
+  /// Stores NegateRAState annotation on \p Inst.
+  void setNegateRAState(MCInst &Inst) const;
+
+  /// Return true if \p Inst has NegateRAState annotation.
+  bool hasNegateRAState(const MCInst &Inst) const;
+
+  /// Sets RememberState annotation on \p Inst.
+  void setRememberState(MCInst &Inst) const;
+
+  /// Return true if \p Inst has RememberState annotation.
+  bool hasRememberState(const MCInst &Inst) const;
+
+  /// Stores RestoreState annotation on \p Inst.
+  void setRestoreState(MCInst &Inst) const;
+
+  /// Return true if \p Inst has RestoreState annotation.
+  bool hasRestoreState(const MCInst &Inst) const;
+
+  /// Stores RA Signed annotation on \p Inst.
+  void setRASigned(MCInst &Inst) const;
+
+  /// Return true if \p Inst has Signed RA annotation.
+  bool isRASigned(const MCInst &Inst) const;
+
+  /// Stores RA Unsigned annotation on \p Inst.
+  void setRAUnsigned(MCInst &Inst) const;
+
+  /// Return true if \p Inst has Unsigned RA annotation.
+  bool isRAUnsigned(const MCInst &Inst) const;
+
+  /// Return true if \p Inst doesn't have any annotation related to RA state.
+  bool isRAStateUnknown(const MCInst &Inst) const;
+
   /// Return true if the instruction is a call with an exception handling info.
   virtual bool isInvoke(const MCInst &Inst) const {
     return isCall(Inst) && getEHInfo(Inst);
diff --git a/bolt/include/bolt/Passes/InsertNegateRAStatePass.h b/bolt/include/bolt/Passes/InsertNegateRAStatePass.h
new file mode 100644
index 0000000000000..836948bf5e9c0
--- /dev/null
+++ b/bolt/include/bolt/Passes/InsertNegateRAStatePass.h
@@ -0,0 +1,46 @@
+//===- bolt/Passes/InsertNegateRAStatePass.cpp ----------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements the InsertNegateRAStatePass class.
+//
+//===----------------------------------------------------------------------===//
+#ifndef BOLT_PASSES_INSERT_NEGATE_RA_STATE_PASS
+#define BOLT_PASSES_INSERT_NEGATE_RA_STATE_PASS
+
+#include "bolt/Passes/BinaryPasses.h"
+
+namespace llvm {
+namespace bolt {
+
+class InsertNegateRAState : public BinaryFunctionPass {
+public:
+  explicit InsertNegateRAState() : BinaryFunctionPass(false) {}
+
+  const char *getName() const override { return "insert-negate-ra-state-pass"; }
+
+  /// Pass entry point
+  Error runOnFunctions(BinaryContext &BC) override;
+  void runOnFunction(BinaryFunction &BF);
+
+private:
+  /// Because states are tracked as MCAnnotations on individual instructions,
+  /// newly inserted instructions do not have a state associated with them.
+  /// New states are "inherited" from the last known state.
+  void inferUnknownStates(BinaryFunction &BF);
+
+  /// Support for function splitting:
+  /// if two consecutive BBs with Signed state are going to end up in different
+  /// functions (so are held by different FunctionFragments), we have to add a
+  /// OpNegateRAState to the beginning of the newly split function, so it starts
+  /// with a Signed state.
+  void coverFunctionFragmentStart(BinaryFunction &BF, FunctionFragment &FF);
+};
+
+} // namespace bolt
+} // namespace llvm
+#endif
diff --git a/bolt/include/bolt/Passes/MarkRAStates.h b/bolt/include/bolt/Passes/MarkRAStates.h
new file mode 100644
index 0000000000000..675ab9727142b
--- /dev/null
+++ b/bolt/include/bolt/Passes/MarkRAStates.h
@@ -0,0 +1,33 @@
+//===- bolt/Passes/MarkRAStates.cpp ---------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements the MarkRAStates class.
+//
+//===----------------------------------------------------------------------===//
+#ifndef BOLT_PASSES_MARK_RA_STATES
+#define BOLT_PASSES_MARK_RA_STATES
+
+#include "bolt/Passes/BinaryPasses.h"
+
+namespace llvm {
+namespace bolt {
+
+class MarkRAStates : public BinaryFunctionPass {
+public:
+  explicit MarkRAStates() : BinaryFunctionPass(false) {}
+
+  const char *getName() const override { return "mark-ra-states"; }
+
+  /// Pass entry point
+  Error runOnFunctions(BinaryContext &BC) override;
+  bool runOnFunction(BinaryFunction &BF);
+};
+
+} // namespace bolt
+} // namespace llvm
+#endif
diff --git a/bolt/include/bolt/Utils/CommandLineOpts.h b/bolt/include/bolt/Utils/CommandLineOpts.h
index 0964c2c9d8473..5c7f1b94315f0 100644
--- a/bolt/include/bolt/Utils/CommandLineOpts.h
+++ b/bolt/include/bolt/Utils/CommandLineOpts.h
@@ -97,6 +97,7 @@ extern llvm::cl::opt<std::string> OutputFilename;
 extern llvm::cl::opt<std::string> PerfData;
 extern llvm::cl::opt<bool> PrintCacheMetrics;
 extern llvm::cl::opt<bool> PrintSections;
+extern llvm::cl::opt<bool> UpdateBranchProtection;
 extern llvm::cl::opt<SplitFunctionsStrategy> SplitStrategy;
 
 // The format to use with -o in aggregation mode (perf2bolt)
diff --git a/bolt/lib/Core/BinaryBasicBlock.cpp b/bolt/lib/Core/BinaryBasicBlock.cpp
index eeab1ed4d7cff..d680850bf2ea9 100644
--- a/bolt/lib/Core/BinaryBasicBlock.cpp
+++ b/bolt/lib/Core/BinaryBasicBlock.cpp
@@ -210,7 +210,11 @@ int32_t BinaryBasicBlock::getCFIStateAtInstr(const MCInst *Instr) const {
       InstrSeen = (&Inst == Instr);
       continue;
     }
-    if (Function->getBinaryContext().MIB->isCFI(Inst)) {
+    // Ignoring OpNegateRAState CFIs here, as they dont have a "State"
+    // number associated with them.
+    if (Function->getBinaryContext().MIB->isCFI(Inst) &&
+        (Function->getCFIFor(Inst)->getOperation() !=
+         MCCFIInstruction::OpNegateRAState)) {
       LastCFI = &Inst;
       break;
     }
diff --git a/bolt/lib/Core/BinaryContext.cpp b/bolt/lib/Core/BinaryContext.cpp
index b7ded6b931a15..206d8eef40288 100644
--- a/bolt/lib/Core/BinaryContext.cpp
+++ b/bolt/lib/Core/BinaryContext.cpp
@@ -1905,6 +1905,9 @@ void BinaryContext::printCFI(raw_ostream &OS, const MCCFIInstruction &Inst) {
   case MCCFIInstruction::OpGnuArgsSize:
     OS << "OpGnuArgsSize";
     break;
+  case MCCFIInstruction::OpNegateRAState:
+    OS << "OpNegateRAState";
+    break;
   default:
     OS << "Op#" << Operation;
     break;
diff --git a/bolt/lib/Core/BinaryFunction.cpp b/bolt/lib/Core/BinaryFunction.cpp
index 07bc71ee538d6..96878925eccad 100644
--- a/bolt/lib/Core/BinaryFunction.cpp
+++ b/bolt/lib/Core/BinaryFunction.cpp
@@ -2814,14 +2814,8 @@ struct CFISnapshot {
     case MCCFIInstruction::OpLLVMDefAspaceCfa:
     case MCCFIInstruction::OpLabel:
     case MCCFIInstruction::OpValOffset:
-      llvm_unreachable("unsupported CFI opcode");
-      break;
     case MCCFIInstruction::OpNegateRAState:
-      if (!(opts::BinaryAnalysisMode || opts::HeatmapMode)) {
-        llvm_unreachable("BOLT-ERROR: binaries using pac-ret hardening (e.g. "
-                         "as produced by '-mbranch-protection=pac-ret') are "
-                         "currently not supported by BOLT.");
-      }
+      llvm_unreachable("unsupported CFI opcode");
       break;
     case MCCFIInstruction::OpRememberState:
     case MCCFIInstruction::OpRestoreState:
@@ -2836,6 +2830,7 @@ struct CFISnapshot {
   void advanceTo(int32_t State) {
     for (int32_t I = CurState, E = State; I != E; ++I) {
       const MCCFIInstruction &Instr = FDE[I];
+      assert(Instr.getOperation() != MCCFIInstruction::OpNegateRAState);
       if (Instr.getOperation() != MCCFIInstruction::OpRestoreState) {
         update(Instr, I);
         continue;
@@ -2960,15 +2955,9 @@ struct CFISnapshotDiff : public CFISnapshot {
     case MCCFIInstruction::OpLLVMDefAspaceCfa:
     case MCCFIInstruction::OpLabel:
     case MCCFIInstruction::OpValOffset:
+    case MCCFIInstruction::OpNegateRAState:
       llvm_unreachable("unsupported CFI opcode");
       return false;
-    case MCCFIInstruction::OpNegateRAState:
-      if (!(opts::BinaryAnalysisMode || opts::HeatmapMode)) {
-        llvm_unreachable("BOLT-ERROR: binaries using pac-ret hardening (e.g. "
-                         "as produced by '-mbranch-protection=pac-ret') are "
-                         "currently not supported by BOLT.");
-      }
-      break;
     case MCCFIInstruction::OpRememberState:
     case MCCFIInstruction::OpRestoreState:
     case MCCFIInstruction::OpGnuArgsSize:
@@ -3117,14 +3106,8 @@ BinaryFunction::unwindCFIState(int32_t FromState, int32_t ToState,
     case MCCFIInstruction::OpLLVMDefAspaceCfa:
     case MCCFIInstruction::OpLabel:
     case MCCFIInstruction::OpValOffset:
-      llvm_unreachable("unsupported CFI opcode");
-      break;
     case MCCFIInstruction::OpNegateRAState:
-      if (!(opts::BinaryAnalysisMode || opts::HeatmapMode)) {
-        llvm_unreachable("BOLT-ERROR: binaries using pac-ret hardening (e.g. "
-                         "as produced by '-mbranch-protection=pac-ret') are "
-                         "currently not supported by BOLT.");
-      }
+      llvm_unreachable("unsupported CFI opcode");
       break;
     case MCCFIInstruction::OpGnuArgsSize:
       // do not affect CFI state
diff --git a/bolt/lib/Core/Exceptions.cpp b/bolt/lib/Core/Exceptions.cpp
index 874419f592cc9..27656c7b3cadf 100644
--- a/bolt/lib/Core/Exceptions.cpp
+++ b/bolt/lib/Core/Exceptions.cpp
@@ -568,10 +568,25 @@ bool CFIReaderWriter::fillCFIInfoFor(BinaryFunction &Function) const {
     case DW_CFA_remember_state:
       Function.addCFIInstruction(
           Offset, MCCFIInstruction::createRememberState(nullptr));
+
+      if (Function.getBinaryContext().isAArch64()) {
+        // Support for pointer authentication:
+        // We need to annotate instructions that modify the RA State, to work
+        // out the state of each instruction in MarkRAStates Pass.
+        if (Offset != 0)
+          Function.setInstModifiesRAState(DW_CFA_remember_state, Offset);
+      }
       break;
     case DW_CFA_restore_state:
       Function.addCFIInstruction(Offset,
                                  MCCFIInstruction::createRestoreState(nullptr));
+      if (Function.getBinaryContext().isAArch64()) {
+        // Support for pointer authentication:
+        // We need to annotate instructions that modify the RA State, to work
+        // out the state of each instruction in MarkRAStates Pass.
+        if (Offset != 0)
+          Function.setInstModifiesRAState(DW_CFA_restore_state, Offset);
+      }
       break;
     case DW_CFA_def_cfa:
       Function.addCFIInstruction(
@@ -629,11 +644,24 @@ bool CFIReaderWriter::fillCFIInfoFor(BinaryFunction &Function) const {
         BC.errs() << "BOLT-WARNING: DW_CFA_MIPS_advance_loc unimplemented\n";
       return false;
     case DW_CFA_GNU_window_save:
-      // DW_CFA_GNU_window_save and DW_CFA_GNU_NegateRAState just use the same
-      // id but mean different things. The latter is used in AArch64.
+      // DW_CFA_GNU_window_save and DW_CFA_AARCH64_negate_ra_state just use the
+      // same id but mean different things. The latter is used in AArch64.
       if (Function.getBinaryContext().isAArch64()) {
-        Function.addCFIInstruction(
-            Offset, MCCFIInstruction::createNegateRAState(nullptr));
+        Function.setContainedNegateRAState();
+        // The location OpNegateRAState CFIs are needed depends on the order of
+        // BasicBlocks, which changes during optimizations. Instead of adding
+        // OpNegateRAState CFIs, an annotation is added to the instruction, to
+        // mark that the instruction modifies the RA State. The actual state for
+        // instructions are worked out in MarkRAStates based on these
+        // annotations.
+        if (Offset != 0)
+          Function.setInstModifiesRAState(DW_CFA_AARCH64_negate_ra_state,
+                                          Offset);
+        else
+          // We cannot Annotate an instruction at Offset == 0.
+          // Instead, we save the initial (Signed) state, and push it to
+          // MarkRAStates' RAStateStack.
+          Function.setInitialRAState(true);
         break;
       }
       if (opts::Verbosity >= 1)
diff --git a/bolt/lib/Core/MCPlusBuilder.cpp b/bolt/lib/Core/MCPlusBuilder.cpp
index 52475227eb32f..e96de80bfa701 100644
--- a/bolt/lib/Core/MCPlusBuilder.cpp
+++ b/bolt/lib/Core/MCPlusBuilder.cpp
@@ -159,6 +159,55 @@ bool MCPlusBuilder::isTailCall(const MCInst &Inst) const {
   return false;
 }
 
+void MCPlusBuilder::setNegateRAState(MCInst &Inst) const {
+  assert(!hasAnnotation(Inst, MCAnnotation::kNegateState));
+  setAnnotationOpValue(Inst, MCAnnotation::kNegateState, true);
+}
+
+bool MCPlusBuilder::hasNegateRAState(const MCInst &Inst) const {
+  return hasAnnotation(Inst, MCAnnotation::kNegateState);
+}
+
+void MCPlusBuilder::setRememberState(MCInst &Inst) const {
+  assert(!hasAnnotation(Inst, MCAnnotation::kRememberState));
+  setAnnotationOpValue(Inst, MCAnnotation::kRememberState, true);
+}
+
+bool MCPlusBuilder::hasRememberState(const MCInst &Inst) const {
+  return hasAnnotation(Inst, MCAnnotation::kRememberState);
+}
+
+void MCPlusBuilder::setRestoreState(MCInst &Inst) const {
+  assert(!hasAnnotation(Inst, MCAnnotation::kRestoreState));
+  setAnnotationOpValue(Inst, MCAnnotation::kRestoreState, true);
+}
+
+bool MCPlusBuilder::hasRestoreState(const MCInst &Inst) const {
+  return hasAnnotation(Inst, MCAnnotation::kRestoreState);
+}
+
+void MCPlusBuilder::setRASigned(MCInst &Inst) const {
+  assert(!hasAnnotation(Inst, MCAnnotation::kRASigned));
+  setAnnotationOpValue(Inst, MCAnnotation::kRASigned, true);
+}
+
+bool MCPlusBuilder::isRASigned(const MCInst &Inst) const {
+  return hasAnnotation(Inst, MCAnnotation::kRASigned);
+}
+
+void MCPlusBuilder::setRAUnsigned(MCInst &Inst) const {
+  assert(!hasAnnotation(Inst, MCAnnotation::kRAUnsigned));
+  setAnnotationOpValue(Inst, MCAnnotation::kRAUnsigned, true);
+}
+
+bool MCPlusBuilder::isRAUnsigned(const MCInst &Inst) const {
+  return hasAnnotation(Inst, MCAnnotation::kRAUnsigned);
+}
+
+bool MCPlusBuilder::isRAStateUnknown(const MCInst &Inst) const {
+  return !(isRAUnsigned(Inst) || isRASigned(Inst));
+}
+
 std::optional<MCLandingPad> MCPlusBuilder::getEHInfo(const MCInst &Inst) const {
   if (!isCall(Inst))
     return std::nullopt;
diff --git a/bolt/lib/Passes/CMakeLists.txt b/bolt/lib/Passes/CMakeLists.txt
index 77d2bb9c2bcb5..d7519518f186f 100644
--- a/bolt/lib/Passes/CMakeLists.txt
+++ b/bolt/lib/Passes/CMakeLists.txt
@@ -17,12 +17,14 @@ add_llvm_library(LLVMBOLTPasses
   IdenticalCodeFolding.cpp
   IndirectCallPromotion.cpp
   Inliner.cpp
+  InsertNegateRAStatePass.cpp
   Instrumentation.cpp
   JTFootprintReduction.cpp
   LongJmp.cpp
   LoopInversionPass.cpp
   LivenessAnalysis.cpp
   MCF.cpp
+  MarkRAStates.cpp
   PatchEntries.cpp
   PAuthGadgetScanner.cpp
   PettisAndHansen.cpp
diff --git a/bolt/lib/Passes/InsertNegateRAStatePass.cpp b/bolt/lib/Passes/InsertNegateRAStatePass.cpp
new file mode 100644
index 0000000000000..33664e1160a7b
--- /dev/null
+++ b/bolt/lib/Passes/InsertNegateRAStatePass.cpp
@@ -0,0 +1,142 @@
+//===- bolt/Passes/InsertNegateRAStatePass.cpp ----------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements the InsertNegateRAStatePass class. It inserts
+// OpNegateRAState CFIs to places where the state of two consecutive
+// instructions are different.
+//
+//===----------------------------------------------------------------------===//
+#include "bolt/Passes/InsertNegateRAStatePass.h"
+#include "bolt/Core/BinaryFunction.h"
+#include "bolt/Core/ParallelUtilities.h"
+#include <cstdlib>
+
+using namespace llvm;
+
+namespace llvm {
+namespace bolt {
+
+void InsertNegateRAState::runOnFunction(BinaryFunction &BF) {
+  BinaryContext &BC = BF.getBinaryContext();
+
+  if (BF.getState() == BinaryFunction::State::Empty)
+    return;
+
+  if (BF.getState() != BinaryFunction::State::CFG &&
+      BF.getState() != BinaryFunction::State::CFG_Finalized) {
+    BC.outs() << "BOLT-INFO: no CFG for " << BF.getPrintName()
+              << " in InsertNegateRAStatePass\n";
+    return;
+  }
+
+  inferUnknownStates(BF);
+
+  for (FunctionFragment &FF : BF.getLayout().fragments()) {
+    coverFunctionFragmentStart(BF, FF);
+    bool FirstIter = true;
+    MCInst PrevInst;
+    // As this pass runs after function splitting, we should only check
+    // consecutive instructions inside FunctionFragments.
+    for (BinaryBasicBlock *BB : FF) {
+      for (auto It = BB->begin(); It != BB->end(); ++It) {
+        MCInst &Inst = *It;
+        if (BC.MIB->isCFI(Inst))
+          continue;
+        if (!FirstIter) {
+          // Consecutive instructions with different RAState means we need to
+          // add a OpNegateRAState.
+          if ((BC.MIB->isRASigned(PrevInst) && BC.MIB->isRAUnsigned(Inst)) ||
+              (BC.MIB->isRAUnsigned(PrevInst) && BC.MIB->isRASigned(Inst))) {
+            It = BF.addCFIInstruction(
+                BB, It, MCCFIInstruction::createNegateRAState(nullptr));
+          }
+        } else {
+          FirstIter = false;
+        }
+        PrevInst = *It;
+      }
+    }
+  }
+}
+
+void InsertNegateRAState::coverFunctionFragmentStart(BinaryFunction &BF,
+                                                     FunctionFragment &FF) {
+  BinaryContext &BC = BF.getBinaryContext();
+  if (FF.empty())
+    return;
+  // Find the first BB in the FF which has Instructions.
+  // BOLT can generate empty BBs at function splitting which are only used as
+  // target labels. We should add the negate-ra-state CFI to the first
+  // non-empty BB.
+  auto *FirstNonEmpty =
+      std::find_if(FF.begin(), FF.end(), [](BinaryBasicBlock *BB) {
+        // getFirstNonPseudo returns BB.end() if it does not find any
+        // Instructions.
+        return BB->getFirstNonPseudo() != BB->end();
+      });
+  // If a function is already split in the input, the first FF can also start
+  // with Signed state. This covers that scenario as well.
+  if (BC.MIB->isRASigned(*((*FirstNonEmpty)->begin()))) {
+    BF.addCFIInstruction(*FirstNonEmpty, (*FirstNonEmpty)->begin(),
+                         MCCFIInstruction::createNegateRAState(nullptr));
+  }
+}
+
+void InsertNegateRAState::inferUnknownStates(BinaryFunction &BF) {
+  BinaryContext &BC = BF.getBinaryContext();
+  bool FirstIter = true;
+  MCInst PrevInst;
+  for (BinaryBasicBlock &BB : BF) {
+    for (MCInst &Inst : BB) {
+      if (BC.MIB->isCFI(Inst))
+        continue;
+
+      if (!FirstIter && BC.MIB->isRAStateUnknown(Inst)) {
+        if (BC.MIB->isRASigned(PrevInst) || BC.MIB->isPSignOnLR(PrevInst)) {
+          BC.MIB->setRASigned(Inst);
+        } else if (BC.MIB->isRAUnsigned(PrevInst) ||
+                   BC.MIB->isPAuthOnLR(PrevInst)) {
+          BC.MIB->setRAUnsigned(Inst);
+        }
+      } else {
+        FirstIter = false;
+      }
+      PrevInst = Inst;
+    }
+  }
+}
+
+Error InsertNegateRAState::runOnFunctions(BinaryContext &BC) {
+  std::atomic<uint64_t> FunctionsModified{0};
+  ParallelUtilities::WorkFuncTy WorkFun = [&](BinaryFunction &BF) {
+    FunctionsModified++;
+    runOnFunction(BF);
+  };
+
+  ParallelUtilities::PredicateTy SkipPredicate = [&](const BinaryFunction &BF) {
+    // We can skip functions which did not include negate-ra-state CFIs. This
+    // includes code using pac-ret hardening as well, if the binary is
+    // compiled with `-fno-exceptions -fno-unwind-tables
+    // -fno-asynchronous-unwind-tables`
+    return !BF.containedNegateRAState() || BF.isIgnored();
+  };
+
+  ParallelUtilities::runOnEachFunction(
+      BC, ParallelUtilities::SchedulingPolicy::SP_INST_LINEAR, WorkFun,
+      SkipPredicate, "InsertNegateRAStatePass");
+
+  BC.outs() << "BOLT-INFO: rewritten pac-ret DWARF info in "
+            << FunctionsModified << " out of " << BC.getBinaryFunctions().size()
+            << " functions "
+            << format("(%.2lf%%).\n", (100.0 * FunctionsModified) /
+                                          BC.getBinaryFunctions().size());
+  return Error::success();
+}
+
+} // end namespace bolt
+} // end namespace llvm
diff --git a/bolt/lib/Passes/MarkRAStates.cpp b/bolt/lib/Passes/MarkRAStates.cpp
new file mode 100644
index 0000000000000..2c5ce4aaa72c0
--- /dev/null
+++ b/bolt/lib/Passes/MarkRAStates.cpp
@@ -0,0 +1,152 @@
+//===- bolt/Passes/MarkRAStates.cpp ---------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements the MarkRAStates class.
+// Three CFIs have an influence on the RA State of an instruction:
+// - NegateRAState flips the RA State,
+// - RememberState pushes the RA State to a stack,
+// - RestoreState pops the RA State from the stack.
+// These are saved as MCAnnotations on instructions they refer to at CFI
+// reading (in CFIReaderWriter::fillCFIInfoFor). In this pass, we can work out
+// the RA State of each instruction, and save it as new MCAnnotations. The new
+// annotations are Signing, Signed, Authenticating and Unsigned. After
+// optimizations, .cfi_negate_ra_state CFIs are added to the places where the
+// state changes in InsertNegateRAStatePass.
+//
+//===----------------------------------------------------------------------===//
+#include "bolt/Passes/MarkRAStates.h"
+#include "bolt/Core/BinaryFunction.h"
+#include "bolt/Core/ParallelUtilities.h"
+#include <cstdlib>
+#include <optional>
+#include <stack>
+
+using namespace llvm;
+
+namespace llvm {
+namespace bolt {
+
+bool MarkRAStates::runOnFunction(BinaryFunction &BF) {
+
+  BinaryContext &BC = BF.getBinaryContext();
+
+  for (const BinaryBasicBlock &BB : BF) {
+    for (const MCInst &Inst : BB) {
+      if ((BC.MIB->isPSignOnLR(Inst) ||
+           (BC.MIB->isPAuthOnLR(Inst) && !BC.MIB->isPAuthAndRet(Inst))) &&
+          !BC.MIB->hasNegateRAState(Inst)) {
+        // Not all functions have .cfi_negate_ra_state in them. But if one does,
+        // we expect psign/pauth instructions to have the hasNegateRAState
+        // annotation.
+        BF.setIgnored();
+        BC.outs() << "BOLT-INFO: inconsistent RAStates in function "
+                  << BF.getPrintName()
+                  << ": ptr sign/auth inst without .cfi_negate_ra_state\n";
+        return false;
+      }
+    }
+  }
+
+  bool RAState = BF.getInitialRAState();
+  std::stack<bool> RAStateStack;
+  RAStateStack.push(RAState);
+
+  for (BinaryBasicBlock &BB : BF) {
+    for (MCInst &Inst : BB) {
+      if (BC.MIB->isCFI(Inst))
+        continue;
+
+      if (BC.MIB->isPSignOnLR(Inst)) {
+        if (RAState) {
+          // RA signing instructions should only follow unsigned RA state.
+          BC.outs() << "BOLT-INFO: inconsistent RAStates in function "
+                    << BF.getPrintName()
+                    << ": ptr signing inst encountered in Signed RA state\n";
+          BF.setIgnored();
+          return false;
+        }
+        // The signing instruction itself is unsigned, the next will be
+        // signed.
+        BC.MIB->setRAUnsigned(Inst);
+      } else if (BC.MIB->isPAuthOnLR(Inst)) {
+        if (!RAState) {
+          // RA authenticating instructions should only follow signed RA state.
+          BC.outs() << "BOLT-INFO: inconsistent RAStates in function "
+                    << BF.getPrintName()
+                    << ": ptr authenticating inst encountered in Unsigned RA "
+                       "state\n";
+          BF.setIgnored();
+          return false;
+        }
+        // The authenticating instruction itself is signed, but the next will be
+        // unsigned.
+        BC.MIB->setRASigned(Inst);
+      } else if (RAState) {
+        BC.MIB->setRASigned(Inst);
+      } else {
+        BC.MIB->setRAUnsigned(Inst);
+      }
+
+      // Updating RAState. All updates are valid from the next instruction.
+      // Because the same instruction can have remember and restore, the order
+      // here is relevant. This is the reason to loop over Annotations instead
+      // of just checking each in a predefined order.
+      for (unsigned int Idx = 0; Idx < Inst.getNumOperands(); Idx++) {
+        std::optional<int64_t> Annotation =
+            BC.MIB->getAnnotationAtOpIndex(Inst, Idx);
+        if (!Annotation)
+          continue;
+        if (Annotation == MCPlus::MCAnnotation::kNegateState)
+          RAState = !RAState;
+        else if (Annotation == MCPlus::MCAnnotation::kRememberState)
+          RAStateStack.push(RAState);
+        else if (Annotation == MCPlus::MCAnnotation::kRestoreState) {
+          RAState = RAStateStack.top();
+          RAStateStack.pop();
+        }
+      }
+    }
+  }
+  return true;
+}
+
+Error MarkRAStates::runOnFunctions(BinaryContext &BC) {
+  std::atomic<uint64_t> FunctionsIgnored{0};
+  ParallelUtilities::WorkFuncTy WorkFun = [&](BinaryFunction &BF) {
+    if (!runOnFunction(BF)) {
+      FunctionsIgnored++;
+    }
+  };
+
+  ParallelUtilities::PredicateTy SkipPredicate = [&](const BinaryFunction &BF) {
+    // We can skip functions which did not include negate-ra-state CFIs. This
+    // includes code using pac-ret hardening as well, if the binary is
+    // compiled with `-fno-exceptions -fno-unwind-tables
+    // -fno-asynchronous-unwind-tables`
+    return !BF.containedNegateRAState() || BF.isIgnored();
+  };
+
+  int Total = llvm::count_if(
+      BC.getBinaryFunctions(),
+      [&](std::pair<const unsigned long, BinaryFunction> &P) {
+        return P.second.containedNegateRAState() && !P.second.isIgnored();
+      });
+
+  ParallelUtilities::runOnEachFunction(
+      BC, ParallelUtilities::SchedulingPolicy::SP_INST_LINEAR, WorkFun,
+      SkipPredicate, "MarkRAStates");
+  BC.outs() << "BOLT-INFO: MarkRAStates ran on " << Total
+            << " functions. Ignored " << FunctionsIgnored << " functions "
+            << format("(%.2lf%%)", (100.0 * FunctionsIgnored) / Total)
+            << " because of CFI inconsistencies\n";
+
+  return Error::success();
+}
+
+} // end namespace bolt
+} // end namespace llvm
diff --git a/bolt/lib/Rewrite/BinaryPassManager.cpp b/bolt/lib/Rewrite/BinaryPassManager.cpp
index d9b7a2bd9a14c..782137e807662 100644
--- a/bolt/lib/Rewrite/BinaryPassManager.cpp
+++ b/bolt/lib/Rewrite/BinaryPassManager.cpp
@@ -19,11 +19,13 @@
 #include "bolt/Passes/IdenticalCodeFolding.h"
 #include "bolt/Passes/IndirectCallPromotion.h"
 #include "bolt/Passes/Inliner.h"
+#include "bolt/Passes/InsertNegateRAStatePass.h"
 #include "bolt/Passes/Instrumentation.h"
 #include "bolt/Passes/JTFootprintReduction.h"
 #include "bolt/Passes/LongJmp.h"
 #include "bolt/Passes/LoopInversionPass.h"
 #include "bolt/Passes/MCF.h"
+#include "bolt/Passes/MarkRAStates.h"
 #include "bolt/Passes/PLTCall.h"
 #include "bolt/Passes/PatchEntries.h"
 #include "bolt/Passes/ProfileQualityStats.h"
@@ -276,6 +278,12 @@ static cl::opt<bool> ShortenInstructions("shorten-instructions",
                                          cl::desc("shorten instructions"),
                                          cl::init(true),
                                          cl::cat(BoltOptCategory));
+
+cl::opt<bool>
+    UpdateBranchProtection("update-branch-protection",
+                           cl::desc("Rewrites pac-ret DWARF CFI instructions "
+                                    "(AArch64-only, on by default)"),
+                           cl::init(true), cl::Hidden, cl::cat(BoltCategory));
 } // namespace opts
 
 namespace llvm {
@@ -353,6 +361,9 @@ Error BinaryFunctionPassManager::runPasses() {
 Error BinaryFunctionPassManager::runAllPasses(BinaryContext &BC) {
   BinaryFunctionPassManager Manager(BC);
 
+  if (BC.isAArch64())
+    Manager.registerPass(std::make_unique<MarkRAStates>());
+
   Manager.registerPass(
       std::make_unique<EstimateEdgeCounts>(PrintEstimateEdgeCounts));
 
@@ -512,6 +523,8 @@ Error BinaryFunctionPassManager::runAllPasses(BinaryContext &BC) {
     // targets. No extra instructions after this pass, otherwise we may have
     // relocations out of range and crash during linking.
     Manager.registerPass(std::make_unique<LongJmpPass>(PrintLongJmp));
+
+    Manager.registerPass(std::make_unique<InsertNegateRAState>());
   }
 
   // This pass should always run last.*
diff --git a/bolt/lib/Rewrite/RewriteInstance.cpp b/bolt/lib/Rewrite/RewriteInstance.cpp
index ddf934796e92e..c428828956ca0 100644
--- a/bolt/lib/Rewrite/RewriteInstance.cpp
+++ b/bolt/lib/Rewrite/RewriteInstance.cpp
@@ -3524,6 +3524,17 @@ void RewriteInstance::disassembleFunctions() {
       }
     }
 
+    // Check if fillCFIInfoFor removed any OpNegateRAState CFIs from the
+    // function.
+    if (Function.containedNegateRAState()) {
+      if (!opts::UpdateBranchProtection) {
+        BC->errs()
+            << "BOLT-ERROR: --update-branch-protection is set to false, but "
+            << Function.getPrintName() << " contains .cfi-negate-ra-state\n";
+        exit(1);
+      }
+    }
+
     // Parse LSDA.
     if (Function.getLSDAAddress() != 0 &&
         !BC->getFragmentsToSkip().count(&Function)) {
diff --git a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
index f271867cb2004..df4f42128605e 100644
--- a/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
+++ b/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp
@@ -244,6 +244,28 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
     }
   }
 
+  bool isPSignOnLR(const MCInst &Inst) const override {
+    std::optional<MCPhysReg> SignReg = getSignedReg(Inst);
+    return SignReg && *SignReg == AArch64::LR;
+  }
+
+  bool isPAuthOnLR(const MCInst &Inst) const override {
+    // LDR(A|B) should not be covered.
+    bool IsChecked;
+    std::optional<MCPhysReg> AuthReg =
+        getWrittenAuthenticatedReg(Inst, IsChecked);
+    return !IsChecked && AuthReg && *AuthReg == AArch64::LR;
+  }
+
+  bool isPAuthAndRet(const MCInst &Inst) const override {
+    return Inst.getOpcode() == AArch64::RETAA ||
+           Inst.getOpcode() == AArch64::RETAB ||
+           Inst.getOpcode() == AArch64::RETAASPPCi ||
+           Inst.getOpcode() == AArch64::RETABSPPCi ||
+           Inst.getOpcode() == AArch64::RETAASPPCr ||
+           Inst.getOpcode() == AArch64::RETABSPPCr;
+  }
+
   std::optional<MCPhysReg> getSignedReg(const MCInst &Inst) const override {
     switch (Inst.getOpcode()) {
     case AArch64::PACIA:
diff --git a/bolt/test/AArch64/negate-ra-state-disallow.s b/bolt/test/AArch64/negate-ra-state-disallow.s
new file mode 100644
index 0000000000000..95adb7146cfce
--- /dev/null
+++ b/bolt/test/AArch64/negate-ra-state-disallow.s
@@ -0,0 +1,25 @@
+# RUN: llvm-mc -filetype=obj -triple aarch64-unknown-unknown %s -o %t.o
+# RUN: %clang %cflags  %t.o -o %t.exe -Wl,-q
+# RUN: not llvm-bolt %t.exe -o %t.exe.bolt --update-branch-protection=false 2>&1 | FileCheck %s
+
+# CHECK: BOLT-ERROR: --update-branch-protection is set to false, but foo contains .cfi-negate-ra-state
+
+  .text
+  .globl  foo
+  .p2align        2
+  .type   foo, at function
+foo:
+  .cfi_startproc
+  hint    #25
+  .cfi_negate_ra_state
+  mov x1, #0
+  hint    #29
+  .cfi_negate_ra_state
+  ret
+  .cfi_endproc
+  .size   foo, .-foo
+
+  .global _start
+  .type _start, %function
+_start:
+  b foo
diff --git a/bolt/test/AArch64/negate-ra-state-incorrect.s b/bolt/test/AArch64/negate-ra-state-incorrect.s
new file mode 100644
index 0000000000000..14d2c384a877d
--- /dev/null
+++ b/bolt/test/AArch64/negate-ra-state-incorrect.s
@@ -0,0 +1,78 @@
+# This test checks that MarkRAStates pass ignores functions with
+# malformed .cfi_negate_ra_state sequences in the input binary.
+
+# The cases checked are:
+#   - extra .cfi_negate_ra_state in Signed state: checked in foo,
+#   - extra .cfi_negate_ra_state in Unsigned state: checked in bar,
+#   - missing .cfi_negate_ra_state from PSign or PAuth instructions: checked in baz.
+
+# RUN: llvm-mc -filetype=obj -triple aarch64-unknown-unknown %s -o %t.o
+# RUN: %clang %cflags  %t.o -o %t.exe -Wl,-q
+# RUN: llvm-bolt %t.exe -o %t.exe.bolt --no-threads | FileCheck %s --check-prefix=CHECK-BOLT
+
+# CHECK-BOLT: BOLT-INFO: inconsistent RAStates in function foo: ptr authenticating inst encountered in Unsigned RA state
+# CHECK-BOLT: BOLT-INFO: inconsistent RAStates in function bar: ptr signing inst encountered in Signed RA state
+# CHECK-BOLT: BOLT-INFO: inconsistent RAStates in function baz: ptr sign/auth inst without .cfi_negate_ra_state
+
+# Check that the incorrect functions got ignored, so they are not in the new .text section
+# RUN: llvm-objdump %t.exe.bolt -d -j .text | FileCheck %s --check-prefix=CHECK-OBJDUMP
+# CHECK-OBJDUMP-NOT: <foo>:
+# CHECK-OBJDUMP-NOT: <bar>:
+# CHECK-OBJDUMP-NOT: <baz>:
+
+
+  .text
+  .globl  foo
+  .p2align        2
+  .type   foo, at function
+foo:
+  .cfi_startproc
+  hint    #25
+  .cfi_negate_ra_state
+  mov x1, #0
+  .cfi_negate_ra_state        // Incorrect CFI in signed state
+  hint    #29
+  .cfi_negate_ra_state
+  ret
+  .cfi_endproc
+  .size   foo, .-foo
+
+  .text
+  .globl  bar
+  .p2align        2
+  .type   bar, at function
+bar:
+  .cfi_startproc
+  mov x1, #0
+  .cfi_negate_ra_state      // Incorrect CFI in unsigned state
+  hint    #25
+  .cfi_negate_ra_state
+  mov x1, #0
+  hint    #29
+  .cfi_negate_ra_state
+  ret
+  .cfi_endproc
+  .size   bar, .-bar
+
+  .text
+  .globl  baz
+  .p2align        2
+  .type   baz, at function
+baz:
+  .cfi_startproc
+  mov x1, #0
+  hint    #25
+  .cfi_negate_ra_state
+  mov x1, #0
+  hint    #29
+                            // Missing .cfi_negate_ra_state
+  ret
+  .cfi_endproc
+  .size   baz, .-baz
+
+  .global _start
+  .type _start, %function
+_start:
+  b foo
+  b bar
+  b baz
diff --git a/bolt/test/AArch64/negate-ra-state-reorder.s b/bolt/test/AArch64/negate-ra-state-reorder.s
new file mode 100644
index 0000000000000..2659f75aff9c9
--- /dev/null
+++ b/bolt/test/AArch64/negate-ra-state-reorder.s
@@ -0,0 +1,73 @@
+# Checking that after reordering BasicBlocks, the generated OpNegateRAState instructions
+# are placed where the RA state is different between two consecutive instructions.
+# This case demonstrates, that the input might have a different amount than the output:
+# input has 4, but output only has 3.
+
+# RUN: llvm-mc -filetype=obj -triple aarch64-unknown-unknown %s -o %t.o
+# RUN: %clang %cflags  %t.o -o %t.exe -Wl,-q
+# RUN: llvm-bolt %t.exe -o %t.exe.bolt --no-threads --reorder-blocks=reverse \
+# RUN: --print-cfg --print-after-lowering --print-only foo | FileCheck %s
+
+# Check that the reordering succeeded.
+# CHECK: Binary Function "foo" after building cfg {
+# CHECK: BB Layout   : .LBB00, .Ltmp2, .Ltmp0, .Ltmp1
+# CHECK: Binary Function "foo" after inst-lowering {
+# CHECK: BB Layout   : .LBB00, .Ltmp1, .Ltmp0, .Ltmp2
+
+
+# Check the generated CFIs.
+# CHECK:         OpNegateRAState
+# CHECK-NEXT:    mov     x2, #0x6
+
+# CHECK:         autiasp
+# CHECK-NEXT:    OpNegateRAState
+# CHECK-NEXT:    ret
+
+# CHECK:         paciasp
+# CHECK-NEXT:    OpNegateRAState
+
+# CHECK:         DWARF CFI Instructions:
+# CHECK-NEXT:        0:  OpNegateRAState
+# CHECK-NEXT:        1:  OpNegateRAState
+# CHECK-NEXT:        2:  OpNegateRAState
+# CHECK-NEXT:    End of Function "foo"
+
+  .text
+  .globl  foo
+  .p2align        2
+  .type   foo, at function
+foo:
+  .cfi_startproc
+  // RA is unsigned
+  mov x1, #0
+  mov x1, #1
+  mov x1, #2
+  // jump into the signed "range"
+  b .Lmiddle
+.Lback:
+// sign RA
+  paciasp
+  .cfi_negate_ra_state
+  mov x2, #3
+  mov x2, #4
+  // skip unsigned instructions
+  b .Lcont
+  .cfi_negate_ra_state
+.Lmiddle:
+// RA is unsigned
+  mov x4, #5
+  b .Lback
+  .cfi_negate_ra_state
+.Lcont:
+// continue in signed state
+  mov x2, #6
+  autiasp
+  .cfi_negate_ra_state
+  ret
+  .cfi_endproc
+  .size   foo, .-foo
+
+  .global _start
+  .type _start, %function
+_start:
+  b foo
diff --git a/bolt/test/AArch64/negate-ra-state.s b/bolt/test/AArch64/negate-ra-state.s
new file mode 100644
index 0000000000000..30786d4ef9f70
--- /dev/null
+++ b/bolt/test/AArch64/negate-ra-state.s
@@ -0,0 +1,76 @@
+# Checking that .cfi-negate_ra_state directives are emitted in the same location as in the input in the case of no optimizations.
+
+# The foo and bar functions are a pair, with the first signing the return address,
+# and the second authenticating it. We have a tailcall between the two.
+# This is testing that BOLT can handle functions starting in signed RA state.
+
+# RUN: llvm-mc -filetype=obj -triple aarch64-unknown-unknown %s -o %t.o
+# RUN: %clang %cflags  %t.o -o %t.exe -Wl,-q
+# RUN: llvm-bolt %t.exe -o %t.exe.bolt --no-threads --print-all | FileCheck %s --check-prefix=CHECK-BOLT
+
+# Check that the negate-ra-state at the start of bar is not discarded.
+# If it was discarded, MarkRAState would report bar as having inconsistent RAStates.
+# This is testing the handling of initialRAState on the BinaryFunction.
+# CHECK-BOLT-NOT: BOLT-INFO: inconsistent RAStates in function foo
+# CHECK-BOLT-NOT: BOLT-INFO: inconsistent RAStates in function bar
+
+# Check that OpNegateRAState CFIs are generated correctly.
+# CHECK-BOLT: Binary Function "foo" after insert-negate-ra-state-pass {
+# CHECK-BOLT:         paciasp
+# CHECK-BOLT-NEXT:    OpNegateRAState
+
+# CHECK-BOLT:      DWARF CFI Instructions:
+# CHECK-BOLT-NEXT:     0:  OpNegateRAState
+# CHECK-BOLT-NEXT: End of Function "foo"
+
+# CHECK-BOLT: Binary Function "bar" after insert-negate-ra-state-pass {
+# CHECK-BOLT:         OpNegateRAState
+# CHECK-BOLT-NEXT:    mov     x1, #0x0
+# CHECK-BOLT-NEXT:    mov     x1, #0x1
+# CHECK-BOLT-NEXT:    autiasp
+# CHECK-BOLT-NEXT:    OpNegateRAState
+# CHECK-BOLT-NEXT:    ret
+
+# CHECK-BOLT:     DWARF CFI Instructions:
+# CHECK-BOLT-NEXT:     0:  OpNegateRAState
+# CHECK-BOLT-NEXT:     1:  OpNegateRAState
+# CHECK-BOLT-NEXT: End of Function "bar"
+
+# End of negate-ra-state insertion logs for foo and bar.
+# CHECK: Binary Function "_start" after insert-negate-ra-state-pass {
+
+# Check that the functions are in the new .text section
+# RUN: llvm-objdump %t.exe.bolt -d -j .text | FileCheck %s --check-prefix=CHECK-OBJDUMP
+# CHECK-OBJDUMP: <foo>:
+# CHECK-OBJDUMP: <bar>:
+
+
+  .text
+  .globl  foo
+  .p2align        2
+  .type   foo, at function
+foo:
+  .cfi_startproc
+  paciasp
+  .cfi_negate_ra_state
+  mov x1, #0
+  b bar
+  .cfi_endproc
+  .size   foo, .-foo
+
+
+
+  .text
+  .globl  bar
+  .p2align        2
+  .type   bar, at function
+bar:
+  .cfi_startproc
+  .cfi_negate_ra_state    // Indicating that RA is signed from the start of bar.
+  mov x1, #0
+  mov x1, #1
+  autiasp
+  .cfi_negate_ra_state
+  ret
+  .cfi_endproc
+  .size   bar, .-bar
diff --git a/bolt/test/AArch64/pacret-split-funcs.s b/bolt/test/AArch64/pacret-split-funcs.s
new file mode 100644
index 0000000000000..27b3471045523
--- /dev/null
+++ b/bolt/test/AArch64/pacret-split-funcs.s
@@ -0,0 +1,54 @@
+# Checking that we generate an OpNegateRAState CFI after the split point,
+# when splitting a region with signed RA state.
+# We split at the fallthrough label.
+
+# REQUIRES: system-linux
+
+# RUN: %clang %s %cflags -march=armv8.3-a -Wl,-q -o %t
+# RUN: link_fdata --no-lbr %s %t %t.fdata
+# RUN: llvm-bolt %t -o %t.bolt --data %t.fdata -split-functions \
+# RUN: --print-only foo --print-split --print-all 2>&1 | FileCheck %s
+
+# Checking that we don't see any OpNegateRAState CFIs before the insertion pass.
+# CHECK-NOT: OpNegateRAState
+# CHECK: Binary Function "foo" after insert-negate-ra-state-pass
+
+# CHECK:       paciasp
+# CHECK-NEXT:  OpNegateRAState
+
+# CHECK: -------   HOT-COLD SPLIT POINT   -------
+
+# CHECK:         OpNegateRAState
+# CHECK-NEXT:    mov x0, #0x1
+# CHECK-NEXT:    autiasp
+# CHECK-NEXT:    OpNegateRAState
+# CHECK-NEXT:    ret
+
+# End of the insert-negate-ra-state-pass logs
+# CHECK: Binary Function "foo" after finalize-functions
+
+  .text
+  .globl  foo
+  .type foo, %function
+foo:
+.cfi_startproc
+.entry_bb:
+# FDATA: 1 foo #.entry_bb# 10
+     paciasp
+    .cfi_negate_ra_state     // indicating that paciasp changed the RA state to signed
+    cmp x0, #0
+    b.eq .Lcold_bb1
+.Lfallthrough:               // split point
+    mov x0, #1
+    autiasp
+    .cfi_negate_ra_state     // indicating that autiasp changed the RA state to unsigned
+    ret
+.Lcold_bb1:                  // Instructions below are not important, they are just here so the cold block is not empty.
+    .cfi_negate_ra_state     // ret has unsigned RA state, but the next inst (autiasp) has signed RA state
+    mov x0, #2
+    retaa
+.cfi_endproc
+  .size foo, .-foo
+
+## Force relocation mode.
+.reloc 0, R_AARCH64_NONE
diff --git a/bolt/test/runtime/AArch64/negate-ra-state.cpp b/bolt/test/runtime/AArch64/negate-ra-state.cpp
new file mode 100644
index 0000000000000..60b0b08950b58
--- /dev/null
+++ b/bolt/test/runtime/AArch64/negate-ra-state.cpp
@@ -0,0 +1,26 @@
+// REQUIRES: system-linux,bolt-runtime
+
+// RUN: %clangxx --target=aarch64-unknown-linux-gnu \
+// RUN: -mbranch-protection=pac-ret -Wl,-q %s -o %t.exe
+// RUN: llvm-bolt %t.exe -o %t.bolt.exe
+// RUN: %t.bolt.exe | FileCheck %s
+
+// CHECK: Exception caught: Exception from bar().
+
+#include <cstdio>
+#include <stdexcept>
+
+void bar() { throw std::runtime_error("Exception from bar()."); }
+
+void foo() {
+  try {
+    bar();
+  } catch (const std::exception &e) {
+    printf("Exception caught: %s\n", e.what());
+  }
+}
+
+int main() {
+  foo();
+  return 0;
+}
diff --git a/bolt/test/runtime/AArch64/pacret-function-split.cpp b/bolt/test/runtime/AArch64/pacret-function-split.cpp
new file mode 100644
index 0000000000000..208fc5c115571
--- /dev/null
+++ b/bolt/test/runtime/AArch64/pacret-function-split.cpp
@@ -0,0 +1,42 @@
+/* This test check that the negate-ra-state CFIs are properly emitted in case of
+   function splitting. The test checks two things:
+    - we split at the correct location: to test the feature,
+        we need to split *before* the bl __cxa_throw at PLT call is made,
+        so the unwinder has to unwind from the split (cold) part.
+
+    - the BOLTed binary runs, and returns the string from foo.
+
+# REQUIRES: system-linux,bolt-runtime
+
+# FDATA: 1 main #split# 1 _Z3foov 0 0 1
+
+# RUN: %clangxx --target=aarch64-unknown-linux-gnu \
+# RUN: -mbranch-protection=pac-ret %s -o %t.exe -Wl,-q
+# RUN: link_fdata %s %t.exe %t.fdata
+# RUN: llvm-bolt %t.exe -o %t.bolt --split-functions --split-eh \
+# RUN: --split-strategy=profile2 --split-all-cold --print-split \
+# RUN: --print-only=_Z3foov --data=%t.fdata 2>&1 | FileCheck \
+# RUN: --check-prefix=BOLT-CHECK %s
+# RUN: %t.bolt | FileCheck %s  --check-prefix=RUN-CHECK
+
+# BOLT-CHECK-NOT: bl      __cxa_throw at PLT
+# BOLT-CHECK: -------   HOT-COLD SPLIT POINT   -------
+# BOLT-CHECK: bl      __cxa_throw at PLT
+
+# RUN-CHECK: Exception caught: Exception from foo().
+*/
+
+#include <cstdio>
+#include <stdexcept>
+
+void foo() { throw std::runtime_error("Exception from foo()."); }
+
+int main() {
+  try {
+    __asm__ __volatile__("split:");
+    foo();
+  } catch (const std::exception &e) {
+    printf("Exception caught: %s\n", e.what());
+  }
+  return 0;
+}

>From 96de68cb075f8e725922da4ae10de9bd3ce5f17b Mon Sep 17 00:00:00 2001
From: Gergely Balint <gergely.balint at arm.com>
Date: Wed, 8 Oct 2025 07:28:40 +0000
Subject: [PATCH 2/2] [BOLT] Fix build failure

Changed the mismatched type to `auto`.
---
 bolt/lib/Passes/MarkRAStates.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/bolt/lib/Passes/MarkRAStates.cpp b/bolt/lib/Passes/MarkRAStates.cpp
index 2c5ce4aaa72c0..eff8b73517c2a 100644
--- a/bolt/lib/Passes/MarkRAStates.cpp
+++ b/bolt/lib/Passes/MarkRAStates.cpp
@@ -133,7 +133,7 @@ Error MarkRAStates::runOnFunctions(BinaryContext &BC) {
 
   int Total = llvm::count_if(
       BC.getBinaryFunctions(),
-      [&](std::pair<const unsigned long, BinaryFunction> &P) {
+      [&](auto &P) {
         return P.second.containedNegateRAState() && !P.second.isIgnored();
       });
 



More information about the llvm-commits mailing list