[PATCH] D90868: [IR] Define @llvm.ptrauth intrinsics.

Kristof Beyls via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Mar 8 02:47:32 PST 2021


kristof.beyls added inline comments.


================
Comment at: llvm/docs/PointerAuth.md:5-8
+Pointer Authentication is a mechanism by which certain pointers are signed,
+are modified to embed that signature in their unused bits, and are
+authenticated (have their signature checked) when used, to prevent pointers
+of unknown origin from being injected into a process.
----------------
This is a long sentence, making it somewhat hard to parse/follow. Would it help to split it into shorter sentences? Maybe something like:


```
Pointer Authentication is a mechanism by which certain pointers are signed.
When a pointer gets signed, a cryptographic hash of its value and other values (pepper and salt) is stored in unused bits of that pointer.
Each time before the pointer is used, it is authenticated, i.e. has its signature checked.
This prevents pointer values of unknown origin from being injected into a process.
```




================
Comment at: llvm/docs/PointerAuth.md:10-12
+To enforce Control Flow Integrity (CFI), this is mostly used for all code
+pointers (function pointers, vtable entries, ...), but certain data pointers
+specified by the ABI (vtable pointer, ...) are also authenticated.
----------------
This sentence seems to be describing a specific ABI's choice of how which pointers to sign.
At the moment, my understanding is that there are 2 ABIs making use of the pointer authentication feature in the instruction set to sign/authenticate pointers:
1. The -mbranch-protection=pac-ret scheme which only signs return addresses.
2. The arm64e scheme which, IIUC, signs as described by the sentence above.

I think it may be better to make it more explicit that the above paragraph describes the arm64e abi specifically. Alternatively, maybe a more generic description could be given. Maybe something like the following?


```
Different ABIs or software targets may require a different set of pointers to be signed in specific ways. For example, -mbranch-protection=pac-ret signs return addresses only. The arm64e ABI signs most code pointers, such as function pointers and vtable entries. Furthermore, it also signs certain data pointers such as vtable pointers.
The ABI-prescribed signing of these pointers is generated automatically by the compiler.
```

This then naturally flows to the following paragraph which starts with "Additionally, with clang extensions, users can specify that a given pointer be signed/authenticated".


================
Comment at: llvm/docs/PointerAuth.md:17-19
+At the IR level, it is represented using:
+
+* a [set of intrinsics](#intrinsics) (to sign/authenticate pointers)
----------------
Maybe it reads a bit more naturally to make this a single sentence, since the bullet list is only 1 long? For example:


```
At the IR level, pointer signing and authentication is represented using a [set of intrinsics](#intrinsics)
```


================
Comment at: llvm/docs/PointerAuth.md:53-58
+* a key: one of a small, fixed set.  The value of the key itself is not
+  directly accessible, but is referenced by ptrauth operations via an
+  identifier.
+
+* salt, or extra diversity data: additional data mixed in with the value and
+  used by the ptrauth operations.
----------------
Would it be helpful to refer to the key as being a cryptographic pepper (https://en.wikipedia.org/wiki/Pepper_(cryptography) ), since the discriminator is referred to as "salt"?


================
Comment at: llvm/docs/PointerAuth.md:57
+
+* salt, or extra diversity data: additional data mixed in with the value and
+  used by the ptrauth operations.
----------------
maybe say "pointer value" rather than "value" to remove potential ambiguity versus key value?


================
Comment at: llvm/docs/PointerAuth.md:83
+```llvm
+declare i64 @llvm.ptrauth.sign(i64 <value>, i32 <key>, i64 <extra data>)
+```
----------------
danielkiss wrote:
> I'd call this parameter `discriminator`, for me it would more intuitive than "extra data". 
> e.g. llvm.ptrauth.blend takes two `discriminators` and returns a new one that should go here.
> 
> also later we say:
> ```
> // Sign an unauthenticated pointer using the specified key and discriminator,
> // passed in that order.
> ```
> Architecture call's it `modifier` because it kind a modifies the key.
I wonder if it'd be better to call this keyid rather than key, since it is not the value of the key, but rather the id of the key that is passed?


================
Comment at: llvm/docs/PointerAuth.md:83
+```llvm
+declare i64 @llvm.ptrauth.sign(i64 <value>, i32 <key>, i64 <extra data>)
+```
----------------
kristof.beyls wrote:
> danielkiss wrote:
> > I'd call this parameter `discriminator`, for me it would more intuitive than "extra data". 
> > e.g. llvm.ptrauth.blend takes two `discriminators` and returns a new one that should go here.
> > 
> > also later we say:
> > ```
> > // Sign an unauthenticated pointer using the specified key and discriminator,
> > // passed in that order.
> > ```
> > Architecture call's it `modifier` because it kind a modifies the key.
> I wonder if it'd be better to call this keyid rather than key, since it is not the value of the key, but rather the id of the key that is passed?
I also wonder if it would be beneficial to make the name "value" more specific.
For example, here, this could be called "rawpointer"? Where a signed pointer is expected, instead of "value", "signedpointer" could be used? I think that would make the intrinsic a little bit more self-documenting.
Of course, this is bike shedding territory...


================
Comment at: llvm/docs/PointerAuth.md:83
+```llvm
+declare i64 @llvm.ptrauth.sign(i64 <value>, i32 <key>, i64 <extra data>)
+```
----------------
kristof.beyls wrote:
> kristof.beyls wrote:
> > danielkiss wrote:
> > > I'd call this parameter `discriminator`, for me it would more intuitive than "extra data". 
> > > e.g. llvm.ptrauth.blend takes two `discriminators` and returns a new one that should go here.
> > > 
> > > also later we say:
> > > ```
> > > // Sign an unauthenticated pointer using the specified key and discriminator,
> > > // passed in that order.
> > > ```
> > > Architecture call's it `modifier` because it kind a modifies the key.
> > I wonder if it'd be better to call this keyid rather than key, since it is not the value of the key, but rather the id of the key that is passed?
> I also wonder if it would be beneficial to make the name "value" more specific.
> For example, here, this could be called "rawpointer"? Where a signed pointer is expected, instead of "value", "signedpointer" could be used? I think that would make the intrinsic a little bit more self-documenting.
> Of course, this is bike shedding territory...
I agree with @danielkiss  that I'd prefer a better name than "extra data". I think that yet another option that might work would be "salt", as the term cryptographic salt is already used above to explain the high-level semantics of these intrinsics.


================
Comment at: llvm/docs/PointerAuth.md:135
+If ``value`` does not have a correct signature for ``key`` and ``extra data``,
+the behavior is undefined.
+
----------------
I'm not entirely sure if we'd like to say behavior is undefined when the signature isn't valid.
If the "undefined" here means the same as "undefined behavior" in C/C++, wouldn't that allow the compiler to assume that it could never happen that the signature isn't valid, and e.g. optimize away signature checks based on that.
I assume that kind of behavior may be mostly theoretical at the moment, but it probably would be better to not enable elimination of signature checks by the compiler, even if only theoretical.
That being said, I'm not sure what the correct description should be for the behavior. For now, I can't come up with anything better than:

```
If ``value`` does not have a correct signature for ``key`` and ``extra data``,
the behavior is a target-specific side effect.
```


================
Comment at: llvm/docs/PointerAuth.md:207
+If ``value`` does not have a correct signature for ``old key`` and
+``old extra data``, the returned value is an invalid, poison pointer.
+
----------------
It seems counter-intuitive to me that resign would return an invalid poison pointer on invalid signature, whereas llvm.ptrauth.auth triggers undefined behaviour on invalid signature on the input signed pointer.
Wouldn't it be better to make the behavior consistent for both intrinsics when the signature of the signed pointer is invalid?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90868/new/

https://reviews.llvm.org/D90868



More information about the llvm-commits mailing list