[llvm] [AArch64][PAC] Support ptrauth builtins and -fptrauth-intrinsics. (PR #65996)

Fri Sep 29 02:43:38 PDT 2023

================
@@ -0,0 +1,265 @@
+Pointer Authentication
+======================
+
+.. contents::
+   :local:
+
+Introduction
+------------
+
+Pointer authentication is a technology which offers strong probabilistic protection against exploiting a broad class of memory bugs to take control of program execution.  When adopted consistently in a language ABI, it provides a form of relatively fine-grained control flow integrity (CFI) check that resists both return-oriented programming (ROP) and jump-oriented programming (JOP) attacks.
+
+While pointer authentication can be implemented purely in software, direct hardware support (e.g. as provided by ARMv8.3 PAuth) can dramatically lower the execution speed and code size costs.  Similarly, while pointer authentication can be implemented on any architecture, taking advantage of the (typically) excess addressing range of a target with 64-bit pointers minimizes the impact on memory performance and can allow interoperation with existing code (by disabling pointer authentication dynamically).  This document will generally attempt to present the pointer authentication feature independent of any hardware implementation or ABI.  Considerations that are implementation-specific are clearly identified throughout.
+
+Note that there are several different terms in use:
+
+- **Pointer authentication** is a target-independent language technology.
+
+- **PAuth** (sometimes referred to as **PAC**, for Pointer Authentication Codes) is an AArch64 architecture extension that provides hardware support for pointer authentication.
+
+- **ARMv8.3** is an AArch64 architecture revision that makes PAuth mandatory.  It is implemented on several shipping processors, including the Apple A12 and later.
+
+* **arm64e** is a specific ABI (not yet fully stable) for implementing pointer authentication using PAuth on certain Apple operating systems.
+
+This document serves four purposes:
+
+- It describes the basic ideas of pointer authentication.
+
+- It documents several language extensions that are useful on targets using pointer authentication.
+
+- It will eventually present a theory of operation for the security mitigation, describing the basic requirements for correctness, various weaknesses in the mechanism, and ways in which programmers can strengthen its protections (including recommendations for language implementors).
+
+- It will eventually document the language ABIs currently used for C, C++, Objective-C, and Swift on arm64e, although these are not yet stable on any target.
+
+Basic Concepts
+--------------
+
+The simple address of an object or function is a **raw pointer**.  A raw pointer can be **signed** to produce a **signed pointer**.  A signed pointer can be then **authenticated** in order to verify that it was **validly signed** and extract the original raw pointer.  These terms reflect the most likely implementation technique: computing and storing a cryptographic signature along with the pointer.  The security of pointer authentication does not rely on attackers not being able to separately overwrite the signature.
+
+An **abstract signing key** is a name which refers to a secret key which can used to sign and authenticate pointers.  The key value for a particular name is consistent throughout a process.
+
+A **discriminator** is an arbitrary value used to **diversify** signed pointers so that one validly-signed pointer cannot simply be copied over another.  A discriminator is simply opaque data of some implementation-defined size that is included in the signature as a salt.
+
+Nearly all aspects of pointer authentication use just these two primary operations:
+
+- ``sign(raw_pointer, key, discriminator)`` produces a signed pointer given a raw pointer, an abstract signing key, and a discriminator.
+
+- ``auth(signed_pointer, key, discriminator)`` produces a raw pointer given a signed pointer, an abstract signing key, and a discriminator.
+
+``auth(sign(raw_pointer, key, discriminator), key, discriminator)`` must succeed and produce ``raw_pointer``.  ``auth`` applied to a value that was ultimately produced in any other way is expected to immediately halt the program.  However, it is permitted for ``auth`` to fail to detect that a signed pointer was not produced in this way, in which case it may return anything; this is what makes pointer authentication a probabilistic mitigation rather than a perfect one.
+
+There are two secondary operations which are required only to implement certain intrinsics in ``<ptrauth.h>``:
+
+- ``strip(signed_pointer, key)`` produces a raw pointer given a signed pointer and a key it was presumptively signed with.  This is useful for certain kinds of tooling, such as crash backtraces; it should generally not be used in the basic language ABI except in very careful ways.
+
+- ``sign_generic(value)`` produces a cryptographic signature for arbitrary data, not necessarily a pointer.  This is useful for efficiently verifying that non-pointer data has not been tampered with.
+
+Whenever any of these operations is called for, the key value must be known statically.  This is because the layout of a signed pointer may vary according to the signing key.  (For example, in ARMv8.3, the layout of a signed pointer depends on whether Top Byte Ignore (TBI) is enabled, which can be set independently for I and D keys.)
+
+.. admonition:: Note for API designers and language implementors
+
+  These are the *primitive* operations of pointer authentication, provided for clarity of description.  They are not suitable either as high-level interfaces or as primitives in a compiler IR because they expose raw pointers.  Raw pointers require special attention in the language implementation to avoid the accidental creation of exploitable code sequences; see the section on `Attackable code sequences`_.
+
+The following details are all implementation-defined:
+
+- the nature of a signed pointer
+- the size of a discriminator
+- the number and nature of the signing keys
+- the implementation of the ``sign``, ``auth``, ``strip``, and ``sign_generic`` operations
+
+While the use of the terms "sign" and "signed pointer" suggest the use of a cryptographic signature, other implementations may be possible.  See `Alternative implementations`_ for an exploration of implementation options.
+
+.. admonition:: Implementation example: ARMv8.3
+
+  Readers may find it helpful to know how these terms map to ARMv8.3 PAuth:
+
+  - A signed pointer is a pointer with a signature stored in the otherwise-unused high bits.  The kernel configures the address width based on the system's addressing needs, and enables TBI for I or D keys as needed.  The bits above the address bits and below the TBI bits (if enabled) are unused.  The signature width then depends on this addressing configuration.
+
+  - A discriminator is a 64-bit integer.  Constant discriminators are 16-bit integers.  Blending a constant discriminator into an address consists of replacing the top 16 bits of the address with the constant.
+
+  - There are five 128-bit signing-key registers, each of which can only be directly read or set by privileged code.  Of these, four are used for signing pointers, and the fifth is used only for ``sign_generic``.  The key data is simply a pepper added to the hash, not an encryption key, and so can be initialized using random data.
+
+  - ``sign`` computes a cryptographic hash of the pointer, discriminator, and signing key, and stores it in the high bits as the signature. ``auth`` removes the signature, computes the same hash, and compares the result with the stored signature.  ``strip`` removes the signature without authenticating it.  While ARMv8.3's ``aut*`` instructions do not themselves trap on failure, the compiler only ever emits them in sequences that will trap.
+
+  - ``sign_generic`` corresponds to the ``pacga`` instruction, which takes two 64-bit values and produces a 64-bit cryptographic hash. Implementations of this instruction are not required to produce meaningful data in all bits of the result.
+
+Discriminators
+~~~~~~~~~~~~~~
+
+A discriminator is arbitrary extra data which alters the signature calculated for a pointer.  When two pointers are signed differently --- either with different keys or with different discriminators --- an attacker cannot simply replace one pointer with the other.  For more information on why discriminators are important and how to use them effectively, see the section on `Substitution attacks`_.
+
+To use standard cryptographic terminology, a discriminator acts as a salt in the signing of a pointer, and the key data acts as a pepper.  That is, both the discriminator and key data are ultimately just added as inputs to the signing algorithm along with the pointer, but they serve significantly different roles.  The key data is a common secret added to every signature, whereas the discriminator is a signing-specific value that can be derived from the circumstances of how a pointer is signed.  However, unlike a password salt, it's important that discriminators be *independently* derived from the circumstances of the signing; they should never simply be stored alongside a pointer.
----------------
kbeyls wrote:

Nit pick, not sure if worth changing the words: "the discriminator is a signing-specific value that can be derived from the circumstances of how a pointer is signed". It also needs to be possible to compute the discriminator at all the places the pointer is authenticated. Would it be worth tweaking the words in this sentence to "the discriminator is a signing-specific value that can be derived from the contexts in which a specific pointer is signed and authenticated"?

https://github.com/llvm/llvm-project/pull/65996