[PATCH] D156716: [AArch64][PAC] Check authenticated LR value during tail call

Tue Aug 15 08:51:47 PDT 2023

atrosinenko added a comment.

Reworked the patch

- replaced "use a fast checker or not" boolean argument with multiple choices
- created a separate source file and put `checkAuthenticatedRegister()` function there as it is likely to be later used by other parts of the codegen
- exposed a tail-call-specific `AArch64FrameLowering::checkAuthenticatedLR()` similarly to `signLR()` and `authenticateLR()` so it can be used by outliner callbacks later
- marked a dummy load instruction as volatile similar to `Hexagon::PS_crash` from HexagonInstrInfo.cpp <https://github.com/llvm/llvm-project/blob/b3cfcf9443873bd7ea3270f943cb4a946b269f33/llvm/lib/Target/Hexagon/HexagonInstrInfo.cpp#L1438>
- added the third implementation of authenticated address checker, similar to this code <https://github.com/ahmedbougacha/llvm-project/commit/58cf59b84ca4e7930a640480fd5ad1ea194864f5#diff-e3119d2d744e8ec571b1101404aca24f847f90db052cc443e14d82e21216041aR1314>. Of course, more checkers can be implemented, but I think it would be better to leave this patch as a common implementation + a few "proofs of concept" and add more checkers via later patches, if needed.

No more changes remain planned on this patch, so I expect it to be ready for further review.

================
Comment at: llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:261-265
+cl::opt<bool> CheckAuthenticatedLRByLoad(
+    "aarch64-check-authenticated-lr-by-load", cl::Hidden, cl::init(true),
+    cl::desc("When performing a tail call with authenticated LR, "
+             "use a load instruction to check the LR"));
+
----------------
atrosinenko wrote:
> kristof.beyls wrote:
> > atrosinenko wrote:
> > > kristof.beyls wrote:
> > > > My understanding is that using a load instruction to check the LR can only be done when code segments are not "execute-only".
> > > > 
> > > > The commit comment on https://github.com/llvm/llvm-project/commit/a932cd409b861582902211690b497cafc774bee6 suggests that at least LLD assumes that code generated for AArch64 targets is always compatible with execute-only segments. Therefore, I think that defaulting to checking-by-loading-lr is probably the wrong thing to do. It seems to me it would be better to default the other way and only allow checking-by-loading-lr when there is a guarantee that the target environment will not enforce "execute-only" code.
> > > Updated. Though, I wonder if using the long snippet by default can introduce noticeable performance regression or not.
> > > 
> > > Meanwhile, is it permitted to remove a dead load in the machine pipeline (and should I mark it as volatile somehow).
> > Yeah, I would expect the long snippet could result in a noticeable performance regression.
> > As discussed in last Monday's Pointer Authentication sync-up call, most current AArch64-based systems do not enforce execute-only. But that will probably change at some point in the future.
> > Maybe it would be best to make the code generation dependent on whether the target platform enforces execute-only? That information would probably need to be stored in some kind of TargetInformation class - I'm not sure if that information is currently stored anywhere.
> > 
> > @ab in that meeting also said that there was a 3rd option - checking the values of bits (I think, I don't fully remember) 55 and 56. Would that be a good option to implement/use by default?
> > 
> > It seems to me that at least in principle, later optimization passes are allowed to remove load instructions that they can prove have no side-effects. It may be prudent indeed to mark the load as volatile - or use any other mechanism to indicate that the load does have a side-effect and shouldn't be removed by later optimizations.
> Considering other possible options, IIUC something like [this fragment](https://github.com/ahmedbougacha/llvm-project/commit/58cf59b84ca4e7930a640480fd5ad1ea194864f5#diff-e3119d2d744e8ec571b1101404aca24f847f90db052cc443e14d82e21216041aR1314) is assumed. I glanced through Optimization Guides for a few CPU cores implementing FEAT_PAuth and it looks like `XPAC*` instructions are usually faster than other PAuth-related instructions (say, latency is 2, throughput is 1 compared to 5/1 for `PAC*` and `AUT*`) - this is somewhat expected as XPAC just clears a range of bits. On the other hand, EOR is still much more efficient.
> 
> Thus, it is probably worth implementing yet another option, but I doubt it should be the default because it relies on the particular TBI setting in effect while XPAC "just works" taking into account current settings, as far as I understand.
I reverted this option to using a fast checker by default. As far as I understand, there are no explicit function telling whether this particular subtarget expects execute-only-compatible code as it is expected to "just work" on AArch64. On the other hand, IIUC the support for execute-only mappings was recently dropped at least on Linux because of interference with Privilege Access Never feature.

At now, I just marked this option with a TODO because I expect DummyLoad to be much faster and even if someone want to use it in an execute-only environment, the issue is at least unlikely to be unnoticed.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156716/new/

https://reviews.llvm.org/D156716