[PATCH] D90868: [IR] Define @llvm.ptrauth intrinsics.

Tue May 25 14:40:33 PDT 2021

ab marked 9 inline comments as done.
ab added a comment.

Rewrote a good chunk of the document following reviews, responded inline to some.  Thanks all for the comments!

One question: I realized this mentions the hardening constraints in a few isolated spots.  Should we have a paragraph (say in "Concepts") that describes this upfront?  Could be summed up as "intermediate values shouldn't be exposed", and the concrete ramifications of that for the IR design and backend impl.

================
Comment at: llvm/docs/PointerAuth.md:17-19
+At the IR level, it is represented using:
+
+* a [set of intrinsics](#intrinsics) (to sign/authenticate pointers)
----------------
kristof.beyls wrote:
> Maybe it reads a bit more naturally to make this a single sentence, since the bullet list is only 1 long? For example:
> 
> 
> ```
> At the IR level, pointer signing and authentication is represented using a [set of intrinsics](#intrinsics)
> ```
Ah right, this looks weird because the list only includes the intrinsics that are defined here.  The later patches add the other IR constructs (attributes, bundle, ...), each with another list item here

================
Comment at: llvm/docs/PointerAuth.md:83
+```llvm
+declare i64 @llvm.ptrauth.sign(i64 <value>, i32 <key>, i64 <extra data>)
+```
----------------
kristof.beyls wrote:
> kristof.beyls wrote:
> > kristof.beyls wrote:
> > > danielkiss wrote:
> > > > I'd call this parameter `discriminator`, for me it would more intuitive than "extra data". 
> > > > e.g. llvm.ptrauth.blend takes two `discriminators` and returns a new one that should go here.
> > > > 
> > > > also later we say:
> > > > ```
> > > > // Sign an unauthenticated pointer using the specified key and discriminator,
> > > > // passed in that order.
> > > > ```
> > > > Architecture call's it `modifier` because it kind a modifies the key.
> > > I wonder if it'd be better to call this keyid rather than key, since it is not the value of the key, but rather the id of the key that is passed?
> > I also wonder if it would be beneficial to make the name "value" more specific.
> > For example, here, this could be called "rawpointer"? Where a signed pointer is expected, instead of "value", "signedpointer" could be used? I think that would make the intrinsic a little bit more self-documenting.
> > Of course, this is bike shedding territory...
> I agree with @danielkiss  that I'd prefer a better name than "extra data". I think that yet another option that might work would be "salt", as the term cryptographic salt is already used above to explain the high-level semantics of these intrinsics.
This all makes sense to me, I've removed most of the non-helpful distinctions between "extra data", "diversity" and "discriminator", to consistently use "discriminator".

I agree value is not super clear, I tried to be more specific in the argument descriptions (using "raw pointer" and "signed pointer").  I'm trying really hard to avoid "unsigned", but "signed" might be fine ;)

I've used "unauthenticated" interchangeably with "raw" before (to make it explicitly about ptrauth), but I think it's better to limit that to operations that don't authenticate (e.g., in the `ptrauth_sign_unauthenticated` clang builtin, as opposed to `ptrauth_auth_and_resign`)

================
Comment at: llvm/docs/PointerAuth.md:104
+
+If ``value`` is already a signed value, the behavior is undefined.
+
----------------
Note that this is also described as "undefined", as in, "we assume it never happens", but I welcome any improvements to the wording.
However, this is less interesting than auth/resign, since there isn't much room for sensible behavior when signing an already-signed pointer, so we may not need to be very specific.

================
Comment at: llvm/docs/PointerAuth.md:135
+If ``value`` does not have a correct signature for ``key`` and ``extra data``,
+the behavior is undefined.
+
----------------
kristof.beyls wrote:
> I'm not entirely sure if we'd like to say behavior is undefined when the signature isn't valid.
> If the "undefined" here means the same as "undefined behavior" in C/C++, wouldn't that allow the compiler to assume that it could never happen that the signature isn't valid, and e.g. optimize away signature checks based on that.
> I assume that kind of behavior may be mostly theoretical at the moment, but it probably would be better to not enable elimination of signature checks by the compiler, even if only theoretical.
> That being said, I'm not sure what the correct description should be for the behavior. For now, I can't come up with anything better than:
> 
> ```
> If ``value`` does not have a correct signature for ``key`` and ``extra data``,
> the behavior is a target-specific side effect.
> ```
That's an interesting question.  Since this was written, we've converged to making it a hard requirement for auth/resign operations to trap.  In our implementation, there's currently still an escape hatch (via cl::opt and attribute) to disable that, but we're planning to get rid of that.  Whether we want that upstream or not is an open question (really for all of you): it can helps with bringup, but with FPAC maybe we're better off always forcing the pre-FPAC codegen to have the SW check/trap.

Regarding:
> If the "undefined" here means the same as "undefined behavior" in C/C++, wouldn't that allow the compiler to assume that it could never happen that the signature isn't valid, and e.g. optimize away signature checks based on that.

yes, it does allow that, and our implementation does optimize away auth/resigns with unused results.
Which brings up an interesting point: since we trap in the auth intrinsic itself, using it to check the signature is at best unergonomic (you'd have to do catch the trap somehow).  So, for cases where the intention is only to check the signature of a pointer without using it, we've been using a different pattern: strip the original pointer, sign it with the expected scheme, and compare that with the original pointer.  That's a little obscure (and has some hardening ramifications), so we've considered adding an intrinsic that does that.  But it hasn't been a common pattern.

So, concretely, I'd be in favor of explicitly defining this (and `resign`) as trapping.  "target-specific side effect" sounds reasonable and avoids over-specifying it as well, though I don't have a specific worry in mind with describing the trap.

================
Comment at: llvm/docs/PointerAuth.md:168
+If ``value`` is not a pointer value for which ``key`` is appropriate, the
+behavior is undefined.
+
----------------
Reading this again I think this "undefined" is less justifiable than the auth/resign ones.  I'm expanding this a bit to explain the actual constraint, and describe this as target-specific.  Depending on how exactly we define "target" this could be more than that, since it's dependent on the runtime OS setup.  But I'm not sure that's helpful here.

================
Comment at: llvm/docs/PointerAuth.md:207
+If ``value`` does not have a correct signature for ``old key`` and
+``old extra data``, the returned value is an invalid, poison pointer.
+
----------------
kristof.beyls wrote:
> It seems counter-intuitive to me that resign would return an invalid poison pointer on invalid signature, whereas llvm.ptrauth.auth triggers undefined behaviour on invalid signature on the input signed pointer.
> Wouldn't it be better to make the behavior consistent for both intrinsics when the signature of the signed pointer is invalid?
It's indeed consistent, the doc is wrong ;)  I will update the resign paragraph to match whatever we settle on for `auth`

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D90868/new/

https://reviews.llvm.org/D90868