[llvm] [BOLT] Rework user-facing documentation of BOLT gadget scanner (PR #176446)

Mon Mar 2 06:40:14 PST 2026

================
@@ -61,124 +115,422 @@ The security scanners implemented in `llvm-bolt-binary-analysis` aim to enable
 the testing of security hardening in arbitrary programs and not just specific
 examples.
 
+### Pointer Authentication
+
+[Pointer Authentication](https://clang.llvm.org/docs/PointerAuthentication.html)
+is intended to make it harder for an attacker to replace pointers at run time.
+This is achieved by making it possible for the compiler or the programmer to
+produce a *signed* pointer from a raw one, and then to probabilistically
+*authenticate the signature* at another site in the program.
+On AArch64 this is achieved by injecting a cryptographic hash, called a
+["Pointer Authentication Code" (PAC)](https://llsoftsec.github.io/llsoftsecbook/#sec:pointer-authentication),
+to the upper bits of the pointer.
+While this approach can be applied to any pointers in the program, the most
+frequent use case, at least in C and C++, is protecting the code pointers.
+The language rules for such pointers are more restrictive, thus allowing the
+compiler to implement various hardenings transparently to the programmer.
+
+Probably the most simple variant of hardening based on Pointer Authentication is
+[`pac-ret`](https://llsoftsec.github.io/llsoftsecbook/#sec:pac-ret), a security
+hardening scheme implemented in compilers such as GCC and Clang, using the
+command line option `-mbranch-protection=pac-ret`. This option is enabled by
+default on most widely used Linux distributions. The hardening scheme mitigates
+[Return-Oriented Programming (ROP)](https://llsoftsec.github.io/llsoftsecbook/#return-oriented-programming)
+attacks by making sure that return addresses are only ever stored to memory
+in a signed form. This makes it substantially harder for attackers to divert
+control flow by overwriting a return address with a different value.
+
+## Pointer Authentication validator
+
+Pointer Authentication analysis is able to search for a number of gadget kinds,
+with the specific set depending on command line options:
+* [`ptrauth-pac-ret`](#return-address-protection-ptrauth-pac-ret) -
+  non-protected return instruction
+* [`ptrauth-tail-calls`](#return-address-protection-before-tail-call-ptrauth-tail-calls) -
+  performing a tail call with an untrusted value in the link register
+* [`ptrauth-forward-cf`](#indirect-branch-call-target-protection-ptrauth-forward-cf) -
+  non-protected destination of branch or call instruction
+* [`ptrauth-sign-oracles`](#signing-oracles-ptrauth-sign-oracles) -
+  signing of untrusted value (signing oracle)
+* [`ptrauth-auth-oracles`](#authentication-oracles-ptrauth-auth-oracles) -
+  revealing the result of authentication without crashing the program on failure
+  (authentication oracle)
+
+Validation is performed by `llvm-bolt-binary-analysis` on a per-function basis.
+First, the register properties are computed by analyzing the function as a whole.
+Then, the instructions are considered in isolation. For each kind of gadget,
+the set of susceptible instructions is computed. The properties of input or
+output registers of each such instruction are analyzed and reports are produced
+for unsafe instruction usage.
+
+Each gadget kind that is searched for can be characterized by the combination of
+* the set of instructions to analyze
+* the properties of input or output operands to check
+
+Currently, three properties can be computed for each register at any given
+program point:
+* **"trusted"** - the register is known not to be attacker-controlled, either because
+  it successfully passed authentication or because its value was materialized
+  using an instruction sequence that an attacker cannot tamper with
+  * **"safe-to-dereference"** (sometimes referred to as "s-t-d" below) -
+    a weaker property that the register can be controlled by an attacker to some
+    extent, but any memory access using a value crafted by an attacker is known
+    to result in an access to an unmapped memory ("segmentation fault").
+    This makes it possible for authentication instructions to return an invalid
+    address on failure as long as it is known to crash the program on accessing
+    memory, but may requires extra care to be taken when implementing operations
+    like re-signing a pointer with a different signing schema without accessing
+    that address in-between. If any failed authentication instruction is
+    guaranteed to terminate the program abnormally, then "safe-to-dereference"
+    and "trusted" properties are equivalent.
+* **"cannot escape unchecked"** - at every possible execution path after this point,
+  it is known to be impossible for an attacker to determine that the value is
+  a result of a failed authentication operation (for example, the register is
+  zeroed, or its value is checked to be valid, so that failure results in
+  immediate abnormal program termination).
+
+The below sub-sections describe the particular detectors. Please note that while
+the descriptions refer to AArch64 for simplicity, the implementation of gadget
+detectors in `llvm-bolt-binary-analysis` attempts to be target-neutral by
+isolating AArch64 specifics in target-dependent hooks.
+
+### Return address protection (`ptrauth-pac-ret`)
+
+**Instructions:** Return instructions without built-in authentication:
+either `ret` (implicit `x30` register) or `ret <reg>`, but not `retaa` and
+similar instructions.
+
+**Property:** The register holding the return address must be safe-to-dereference.
+
+**Notes:** Cross-exception-level return instructions (`eret`) are not analyzed yet.
+
+A report is generated for a return instruction whose destination is possibly
+attacker-controlled.
+
+**Examples:**
+```asm
+authenticated_return:
+  pacibsp
+  ; ...
+  ; ... some code here ...
+  ; ...
+  retab ; Built-in authentication, thus out of scope.
+
+good_leaf_function:
+  ; x30 is implicitly safe-to-dereference (s-t-d) and trusted at function entry.
+  mov     x0, #42
+  ; x30 was not written to by this function, thus remains s-t-d.
+  ret
+
+good_non_leaf_function:
+  pacibsp
+
+  ; Spilling signed return address.
+  stp     x29, x30, [sp, #-16]!
+  mov     x29, sp
+
+  bl      callee
+
+  ; Re-loading signed return address.
+  ; LDP writes to x30 and thus resets it to neither s-t-d nor trusted state.
+  ldp     x29, x30, [sp], #16
+
+  ; Checking that signature is valid.
+  ; AUTIBSP sets "s-t-d" property of x30, but not "trusted" (unless FEAT_FPAC
+  ; is known to be implemented).
+  autibsp
+
+  ; x30 is s-t-d at this point.
+  ret
+
+bad_spill:
+  ; x30 is implicitly s-t-d at function entry.
+  stp     x29, x30, [sp, #-16]!
+  mov     x29, sp
+
+  bl      callee ; Spilled x30 may have been overwritten on stack.
+
+  ; Writing to x30 resets its s-t-d property.
+  ldp     x29, x30, [sp], #16
+  ; x30 is unsafe by the time it is used by ret, thus generating a report.
+  ret
+
+bad_clobber:
+  pacibsp
+  ; ...
+  ; ... some code here ...
+  ; ...
+  autibsp
+  mov     x30, x1
+  ; The value in LR is unsafe, even though there was autibsp above.
+  ret
+```
 
-#### pac-ret analysis
+### Return address protection before tail call (`ptrauth-tail-calls`)
 
-`pac-ret` protection is a security hardening scheme implemented in compilers
-such as GCC and Clang, using the command line option
-`-mbranch-protection=pac-ret`. This option is enabled by default on most widely
-used Linux distributions.
+**Instructions:** Branch instructions (both direct and indirect, regular or
+with built-in authentication), classified as tail calls either by BOLT or by
+PtrAuth gadget scanner's heuristic.
 
-The hardening scheme mitigates
-[Return-Oriented Programming (ROP)](https://llsoftsec.github.io/llsoftsecbook/#return-oriented-programming)
-attacks by making sure that return addresses are only ever stored to memory with
-a cryptographic hash, called a
-["Pointer Authentication Code" (PAC)](https://llsoftsec.github.io/llsoftsecbook/#pointer-authentication),
-in the upper bits of the pointer. This makes it substantially harder for
-attackers to divert control flow by overwriting a return address with a
-different value.
+**Property:** `x30` must be trusted.
 
-The hardening scheme relies on compilers producing appropriate code sequences when
-processing return addresses, especially when these are stored to and retrieved
-from memory.
+**Notes:** Heuristics are involved to classify instructions either as a tail
+call or as another kind of branch (such as jump table or computed goto).
 
-The `pac-ret` binary analysis can be invoked using the command line option
-`--scanners=pac-ret`. It makes `llvm-bolt-binary-analysis` scan through the
-provided binary, checking each function for the following security property:
+A report is generated if tail call is performed with untrusted link register.
+This basically means that the tail-callee function would have link register
+untrusted on its entry.
 
-> For each procedure and exception return instruction, the destination register
-> must have one of the following properties:
->
-> 1. be immutable within the function, or
-> 2. the last write to the register must be by an authenticating instruction. This
->    includes combined authentication and return instructions such as `RETAA`.
+```asm
+untrusted_tail_call:
+  stp     x29, x30, [sp, #-16]!
+  mov     x29, sp
+  bl      callee
+  ldp     x29, x30, [sp], #16
+  ; x30 is neither trusted nor safe-to-dereference at this point.
+  b       tail_callee
 
-##### Example 1
+tail_callee:
+  pacibsp
+  ; ...
+```
 
-For example, a typical non-pac-ret-protected function looks as follows:
+Even though `x30` is likely to be safe-to-dereference before exit from a function
+(whether via return or tail call) in a consistently pac-ret-protected program,
+with respect to this gadget kind it further must be fully "trusted".
+With `x30` being safe-to-dereference, but not fully trusted at the entry to the
+tail callee, the subsequent `pacibsp` instruction may act as a [signing oracle](#signing-oracles).
+
+**FIXME:** Is it actually possible when none of `FEAT_FPAC`, `FEAT_EPAC`, or `FEAT_PAuth2` are implemented?
----------------
kbeyls wrote:

I'm afraid I don't remember being involved myself in any discussions on this topic.
If @ahmedbougacha or @pcc don't have a quick reply, I would not hold up landing this much improved documentation on this outstanding comment.
I guess either this could be made into a small remark in this document, or as a github issue to improve this documentation further?

https://github.com/llvm/llvm-project/pull/176446