[llvm] [BOLT][AArch64] Handle OpNegateRAState to enable optimizing binaries with pac-ret hardening (PR #120064)

Tue Sep 9 09:05:53 PDT 2025

================
@@ -0,0 +1,154 @@
+# Optimizing binaries with pac-ret hardening
+
+This is a design document about processing the `DW_CFA_AARCH64_negate_ra_state` DWARF instruction in BOLT. As is describes internal design decisions, the intended audiance is BOLT developers. The document is an updated version of the [RFC posted on the LLVM Discourse](https://discourse.llvm.org/t/rfc-bolt-aarch64-handle-opnegaterastate-to-enable-optimizing-binaries-with-pac-ret-hardening/86594).
+
+
+`DW_CFA_AARCH64_negate_ra_state` is also referred to as  `.cfi_negate_ra_state` in assembly, or `OpNegateRAState` is BOLT sources. In this document, I will use **negate-ra-state** as a shorthand.
+
+## Introduction
+
+### Pointer Authentication
+
+Refer to the [pac-ret section of the BOLT-binary-analysis document](BinaryAnalysis.md#pac-ret-analysis).
+
+### DW_CFA_AARCH64_negate_ra_state
+
+The negate-ra-state CFI is a vendor-specific Call Frame Instruction defined in the [Arm ABI](https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst#id1).
+
+```
+The DW_CFA_AARCH64_negate_ra_state operation negates bit[0] of the RA_SIGN_STATE pseudo-register.
+```
+
+This bit indicates to the unwinder whether the current return address is signed or not (hence the name). The unwinder uses this information to authenticate the pointer, and remove the Pointer Authentication Code (PAC) bits. Incorrect negate-ra-state placement can lead to the unwinder trying to authenticate an unsigned pointer (which segfaults), or skipping authenticating a signed pointer, and trying to access an incorrect location (also leading to a segfault).
+
+(Note: not *all* unwinders do this. Some use the `xpac` instruction to strip the PAC bits without authenticating the pointer. This is incorrect, as it allows control-flow modification in the case of unwinding.)
+
+There are no DWARF instructions to directly set or clear the RA State. However, two other CFIs can also affect the RA state:
+- `DW_CFA_remember_state`: this CFI stores register rules onto an implicit stack.
+- `DW_CFA_restore_state`:  this CFI pops rules from this stack.
+
+Example:
+
+| CFI                            | Effect on RA state             |
+| ------------------------------ | ------------------------------ |
+| (default)                      | 0                              |
+| DW_CFA_AARCH64_negate_ra_state | 0 -> 1                         |
+| DW_CFA_remember_state          | 1 pushed to the stack          |
+| DW_CFA_AARCH64_negate_ra_state | 1 -> 0                         |
+| DW_CFA_restore_state           | 0 -> 1 (popped from the stack) |
+
+The Arm ABI also defines the DW_CFA_AARCH64_negate_ra_state_with_pc CFI, but it is not widely used, and is [likely to become deprecated](https://github.com/ARM-software/abi-aa/issues/327).
+
+### Where are these CFIs needed?
+
+In all locations, where two consecutive instructions have different RA state, this need to be indicated to the unwinder. This happens at pointer signing and authenticating. The other case where two consecutive instructions have different RA state, but neither of them is signing or authenticating means that they are not next to each other in control flow. One is part of an execution path with signed RA, the other is part of a path with an unsigned RA.
----------------
paschalis-mpeis wrote:

> **as** this need to be indicated

https://github.com/llvm/llvm-project/pull/120064