[llvm] [BOLT][binary-analysis] Add initial pac-ret gadget scanner (PR #122304)

Wed Jan 15 05:45:41 PST 2025

================
@@ -148,6 +149,68 @@ class AArch64MCPlusBuilder : public MCPlusBuilder {
     return false;
   }
 
+  MCPhysReg getAuthenticatedReg(const MCInst &Inst) const override {
+    switch (Inst.getOpcode()) {
+    case AArch64::AUTIAZ:
+    case AArch64::AUTIBZ:
+    case AArch64::AUTIASP:
+    case AArch64::AUTIBSP:
+    case AArch64::RETAA:
+    case AArch64::RETAB:
+      return AArch64::LR;
+    case AArch64::AUTIA1716:
+    case AArch64::AUTIB1716:
+      return AArch64::X17;
+    case AArch64::ERETAA:
+    case AArch64::ERETAB:
+      return AArch64::LR;
----------------
kbeyls wrote:

I think you're right, thanks for pointing this out.
I investigated this a bit, and came to the following preliminary conclusions.

`ERETAA`/`ERETAB`/`ERET` instructions return to the address stored in one of
the system registers `ELR_EL1`, `ELR_EL2` or `ELR_EL3`, depending on the
current "Exception Level" state.

I'm not aware of any examples of use of `ERETA{A|B}` in open source
software currently. My understanding is that it could be used as follows to create a
pac-ret-like hardening for `ERET`:

```
       exception_entry:
         MRS x0, elr_el1 // Could be elr_el2, elr_el3 depending on whether
                         // this code is designed to run at EL1, EL2 or EL3.
         PACIA x0, SP
         STR   x0, [some place]
         ...
         LDR   x0, [some place]
         MSR   elr_el1, x0
         ERETAA
```

Assuming my understanding above is correct, this makes it impossible to
write a fully static "pac-ret"-like checker for exception returns,
because the static analyzer doesn't know whether the code will run at
EL1, EL2 or EL3.  Therefore, the static analyzer can't know whether to
check for writes to `elr_el1`, `elr_el2` or `elr_el3`.

Also, LLVM does not model the system registers `ELR_EL1`, `ELR_EL2` and `ELR_EL3` as regular registers. Presumably, tracking changes to system registers would require quite substantial changes to the current implementation of the analyzer.

I decided to handle ERETs in the analyzer by producing a warning when encountered, which states that the analyzer cannot analyze ERETs.

This was implemented in the latest commit f44f9bf on this PR.

This change results in the non-pac-ret-protected-return analyzer now
needing to produce 2 different kinds of diagnostics.

I found that this is nicely modelled by creating a class hierarchy
for these diagnostics.

However, it seems that MCAnnotations (which was the way diagnostics to be produced were stored during analysis) currently do not work (well or at
all) when trying to store any object from a class hierarchy, rather than
an object with fixed type.
I realized that it was a bit strange that the diagnostics
that need to be produced are stored using MCAnnotations at all...
So, the latest commit 
f44f9bf redesigns that aspect so that the diagnostics that need to be
produced are not stored as MCAnnotations on an MCInst at all...

Overall, it seems like this is a cleaner design and a better example for
any future binary analyses.

https://github.com/llvm/llvm-project/pull/122304