[PATCH] D156716: [AArch64][PAC] Check authenticated LR value during tail call

Thu Aug 10 07:12:45 PDT 2023

atrosinenko marked 2 inline comments as not done.
atrosinenko added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:261-265
+cl::opt<bool> CheckAuthenticatedLRByLoad(
+    "aarch64-check-authenticated-lr-by-load", cl::Hidden, cl::init(true),
+    cl::desc("When performing a tail call with authenticated LR, "
+             "use a load instruction to check the LR"));
+
----------------
kristof.beyls wrote:
> atrosinenko wrote:
> > kristof.beyls wrote:
> > > My understanding is that using a load instruction to check the LR can only be done when code segments are not "execute-only".
> > > 
> > > The commit comment on https://github.com/llvm/llvm-project/commit/a932cd409b861582902211690b497cafc774bee6 suggests that at least LLD assumes that code generated for AArch64 targets is always compatible with execute-only segments. Therefore, I think that defaulting to checking-by-loading-lr is probably the wrong thing to do. It seems to me it would be better to default the other way and only allow checking-by-loading-lr when there is a guarantee that the target environment will not enforce "execute-only" code.
> > Updated. Though, I wonder if using the long snippet by default can introduce noticeable performance regression or not.
> > 
> > Meanwhile, is it permitted to remove a dead load in the machine pipeline (and should I mark it as volatile somehow).
> Yeah, I would expect the long snippet could result in a noticeable performance regression.
> As discussed in last Monday's Pointer Authentication sync-up call, most current AArch64-based systems do not enforce execute-only. But that will probably change at some point in the future.
> Maybe it would be best to make the code generation dependent on whether the target platform enforces execute-only? That information would probably need to be stored in some kind of TargetInformation class - I'm not sure if that information is currently stored anywhere.
> 
> @ab in that meeting also said that there was a 3rd option - checking the values of bits (I think, I don't fully remember) 55 and 56. Would that be a good option to implement/use by default?
> 
> It seems to me that at least in principle, later optimization passes are allowed to remove load instructions that they can prove have no side-effects. It may be prudent indeed to mark the load as volatile - or use any other mechanism to indicate that the load does have a side-effect and shouldn't be removed by later optimizations.
Considering other possible options, IIUC something like [this fragment](https://github.com/ahmedbougacha/llvm-project/commit/58cf59b84ca4e7930a640480fd5ad1ea194864f5#diff-e3119d2d744e8ec571b1101404aca24f847f90db052cc443e14d82e21216041aR1314) is assumed. I glanced through Optimization Guides for a few CPU cores implementing FEAT_PAuth and it looks like `XPAC*` instructions are usually faster than other PAuth-related instructions (say, latency is 2, throughput is 1 compared to 5/1 for `PAC*` and `AUT*`) - this is somewhat expected as XPAC just clears a range of bits. On the other hand, EOR is still much more efficient.

Thus, it is probably worth implementing yet another option, but I doubt it should be the default because it relies on the particular TBI setting in effect while XPAC "just works" taking into account current settings, as far as I understand.

================
Comment at: llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1987
+  //
+  // Turn it into:
+  //
----------------
kristof.beyls wrote:
> atrosinenko wrote:
> > kristof.beyls wrote:
> > > I think this comment would be easier to read if instead of just saying "Turn it into:", it would also clearly indicate the intended effect. For example:
> > > 
> > > "To avoid generating a signing oracle, generate a code sequence that explicitly checks if the authentication passed or failed, as follows."
> > > 
> > > With just having written those extra few words now, I think that this extra checking may not be needed if the target core implements FEAT_FPAC (and maybe FEAT_FPACCOMBINE?). If so, maybe a FIXME here would be good to not generate the extra checking code sequences when FEAT_FPAC is implemented by the targeted core?
> > Fixed.
> > 
> > There exist separate review items D156784 and D156785 on taking FEAT_FPAC into account. I wonder if the particular CPU models should mention `+fpac` as supported or should it be only requested explicitly by the user - unlike many other subtarget features, FPAC doesn't make executable code explicitly fail on an unsupported CPU but silently makes it a bit less secure. So, the case "I have CPU X that implements all the instructions that are supported by Y (but not FEAT_FPAC)" may technically be unnoticed.
> Good question!
> I don't have a clear answer, I'm afraid.
> I'd be interested to hear other people's opinions on this.
I think, in D156784 I could just add a TODO at now (to make hasFPAC available for usage) and FeatureFPAC may be added to relevant CPU cores by a later patch, if needed.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156716/new/

https://reviews.llvm.org/D156716