[PATCH] D156716: [AArch64][PAC] Check authenticated LR value during tail call

Fri Aug 18 01:12:08 PDT 2023

kristof.beyls added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:261-265
+cl::opt<bool> CheckAuthenticatedLRByLoad(
+    "aarch64-check-authenticated-lr-by-load", cl::Hidden, cl::init(true),
+    cl::desc("When performing a tail call with authenticated LR, "
+             "use a load instruction to check the LR"));
+
----------------
atrosinenko wrote:
> atrosinenko wrote:
> > kristof.beyls wrote:
> > > atrosinenko wrote:
> > > > kristof.beyls wrote:
> > > > > My understanding is that using a load instruction to check the LR can only be done when code segments are not "execute-only".
> > > > > 
> > > > > The commit comment on https://github.com/llvm/llvm-project/commit/a932cd409b861582902211690b497cafc774bee6 suggests that at least LLD assumes that code generated for AArch64 targets is always compatible with execute-only segments. Therefore, I think that defaulting to checking-by-loading-lr is probably the wrong thing to do. It seems to me it would be better to default the other way and only allow checking-by-loading-lr when there is a guarantee that the target environment will not enforce "execute-only" code.
> > > > Updated. Though, I wonder if using the long snippet by default can introduce noticeable performance regression or not.
> > > > 
> > > > Meanwhile, is it permitted to remove a dead load in the machine pipeline (and should I mark it as volatile somehow).
> > > Yeah, I would expect the long snippet could result in a noticeable performance regression.
> > > As discussed in last Monday's Pointer Authentication sync-up call, most current AArch64-based systems do not enforce execute-only. But that will probably change at some point in the future.
> > > Maybe it would be best to make the code generation dependent on whether the target platform enforces execute-only? That information would probably need to be stored in some kind of TargetInformation class - I'm not sure if that information is currently stored anywhere.
> > > 
> > > @ab in that meeting also said that there was a 3rd option - checking the values of bits (I think, I don't fully remember) 55 and 56. Would that be a good option to implement/use by default?
> > > 
> > > It seems to me that at least in principle, later optimization passes are allowed to remove load instructions that they can prove have no side-effects. It may be prudent indeed to mark the load as volatile - or use any other mechanism to indicate that the load does have a side-effect and shouldn't be removed by later optimizations.
> > Considering other possible options, IIUC something like [this fragment](https://github.com/ahmedbougacha/llvm-project/commit/58cf59b84ca4e7930a640480fd5ad1ea194864f5#diff-e3119d2d744e8ec571b1101404aca24f847f90db052cc443e14d82e21216041aR1314) is assumed. I glanced through Optimization Guides for a few CPU cores implementing FEAT_PAuth and it looks like `XPAC*` instructions are usually faster than other PAuth-related instructions (say, latency is 2, throughput is 1 compared to 5/1 for `PAC*` and `AUT*`) - this is somewhat expected as XPAC just clears a range of bits. On the other hand, EOR is still much more efficient.
> > 
> > Thus, it is probably worth implementing yet another option, but I doubt it should be the default because it relies on the particular TBI setting in effect while XPAC "just works" taking into account current settings, as far as I understand.
> I reverted this option to using a fast checker by default. As far as I understand, there are no explicit function telling whether this particular subtarget expects execute-only-compatible code as it is expected to "just work" on AArch64. On the other hand, IIUC the support for execute-only mappings was recently dropped at least on Linux because of interference with Privilege Access Never feature.
> 
> At now, I just marked this option with a TODO because I expect DummyLoad to be much faster and even if someone want to use it in an execute-only environment, the issue is at least unlikely to be unnoticed.
I've done some further investigations and it seems that execute-only enforcement is most likely to get enabled in the not-too-distant future for some very popular AArch64 platforms. Therefore, having something that breaks the execute-only property enabled by default across all AArch64 platforms looks like a no-go. This would break most programs on those platforms.

It seems that this patch also changes code generation for pac-ret (i.e. where only return addresses are protected by pointer authentication)? There are a number of AArch64 platforms that already ship with pac-ret.
Enabling this for pac-ret moves the performance vs code-size vs security hardening trade-off for those platforms for their default code generation. Therefore, it seems to me that a lot of benchmarking would be needed to measure the code size and performance impact of this change before landing it for pac-ret.

With the above 2 observations in mind, I think that:
* using the AuthCheckMethod::DummyLoad cannot be the default across all AArch64 platforms, as it will break most code on AArch64 platforms that plan to enable execute-only; and there is a strong indication that execute-only will be enforced on some of the most popular platforms.
* Protecting against this authentication oracle for pac-ret code generation could only be done by doing substantial amounts of benchmarking to help make a decision on whether this is a worthwhile performance vs code-size vs security hardening trade-off.

I'd recommend to:
(a) not change code generation for pac-ret at all in this patch.
(b) change the default for AuthenticatedLRCheckMethod to something that does not break execute-only.

Ultimately, it seems to me that each platform will have to decide where its default should be in the performance vs code size vs hardening trade-off. With very few software platforms currently choosing to pay the cost for fine-grain control-flow integrity, it seems to me that a default of not hardening against this authentication oracle may be the least bad option available.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156716/new/

https://reviews.llvm.org/D156716