[llvm] [BOLT] Overhaul the comments in PAuthGadgetScanner for readability (NFC) (PR #169801)

via llvm-commits llvm-commits at lists.llvm.org
Thu Nov 27 05:34:56 PST 2025


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-bolt

Author: Anatoly Trosinenko (atrosinenko)

<details>
<summary>Changes</summary>

Update the comments in PAuthGadgetScanner.cpp to better describe the
current version of the code. Along the way, shorten identifier names
that are redundant taking their context into account:
`RegsToTrackInstsFor` (made `RegsToTrack`) and `getNumTrackedRegisters`
(made `getNumRegisters`).

---

Patch is 29.38 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169801.diff


1 Files Affected:

- (modified) bolt/lib/Passes/PAuthGadgetScanner.cpp (+135-111) 


``````````diff
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 01b350b2f11fe..1594610113179 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -6,8 +6,40 @@
 //
 //===----------------------------------------------------------------------===//
 //
-// This file implements a pass that looks for any AArch64 return instructions
-// that may not be protected by PAuth authentication instructions when needed.
+// This file implements a pass that looks for instructions which are expected
+// to be protected by Pointer Authentication, but the protection is missing
+// or insufficient. While the existing implementation only applies to AArch64,
+// it is intended to keep this file reasonably target-neutral, and place
+// AArch64-specific hooks in AArch64MCPlusBuilder.
+//
+// Various gadget kinds (patterns of unsafe instruction usage) can be detected.
+// Gadgets of the particular kind are detected by inspecting the susceptible
+// instructions (such as "all return instructions" or "all indirect branches
+// and calls") and validating properties of their operands. This is achieved
+// by first running a dataflow analysis (or its simplified counterpart, if CFG
+// information is not available for the particular function) to compute the
+// properties of registers before or after each instruction is executed and
+// then passing each instruction together with the computed state to a number
+// of gadget detectors that consume the results of this particular analysis.
+//
+// There are two broad groups of gadgets: those being detected by inspecting
+// the input operands of the susceptible instructions and those being detected
+// by analyzing their outputs. The former detectors consume SrcState computed
+// by iterating forwards over the instructions - normally by
+// DataflowSrcSafetyAnalysis (or if BOLT was unable to reconstruct CFG for the
+// function, by CFGUnawareSrcSafetyAnalysis). The latter consume DstState which
+// is computed in a similar way by iterating backwards over the instructions.
+//
+// Furthermore, when producing a diagnostic for a found gadget, this tool tries
+// to provide the clues on which instructions made the operands unsafe (such as
+// the set of last instructions that wrote an unsafe value to the register
+// along various possible paths of execution leading to this instruction).
+// This is achieved by re-running the same analysis for the second time to
+// collect the detailed information to improve the reports produced on the
+// first run. Since it is expected that most of the functions do not have any
+// issues to be reported, the second analysis run which is more time- and
+// memory-consuming is skipped for most functions. Please note that unlike
+// the reports themselves, these clues are provided on a best-effort basis.
 //
 //===----------------------------------------------------------------------===//
 
@@ -94,7 +126,7 @@ class TrackedRegisters {
 
   ArrayRef<MCPhysReg> getRegisters() const { return Registers; }
 
-  size_t getNumTrackedRegisters() const { return Registers.size(); }
+  size_t getNumRegisters() const { return Registers.size(); }
 
   bool empty() const { return Registers.empty(); }
 
@@ -111,43 +143,13 @@ class TrackedRegisters {
   }
 };
 
-// The security property that is checked is:
-// When a register is used as the address to jump to in a return instruction,
-// that register must be safe-to-dereference. It must either
-// (a) be safe-to-dereference at function entry and never be changed within this
-//     function, i.e. have the same value as when the function started, or
-// (b) the last write to the register must be by an authentication instruction.
-
-// This property is checked by using dataflow analysis to keep track of which
-// registers have been written (def-ed), since last authenticated. For pac-ret,
-// any return instruction using a register which is not safe-to-dereference is
-// a gadget to be reported. For PAuthABI, probably at least any indirect control
-// flow using such a register should be reported.
-
-// Furthermore, when producing a diagnostic for a found non-pac-ret protected
-// return, the analysis also lists the last instructions that wrote to the
-// register used in the return instruction.
-// The total set of registers used in return instructions in a given function is
-// small. It almost always is just `X30`.
-// In order to reduce the memory consumption of storing this additional state
-// during the dataflow analysis, this is computed by running the dataflow
-// analysis twice:
-// 1. In the first run, the dataflow analysis only keeps track of the security
-//    property: i.e. which registers have been overwritten since the last
-//    time they've been authenticated.
-// 2. If the first run finds any return instructions using a register last
-//    written by a non-authenticating instruction, the dataflow analysis will
-//    be run a second time. The first run will return which registers are used
-//    in the gadgets to be reported. This information is used in the second run
-//    to also track which instructions last wrote to those registers.
-
 typedef SmallPtrSet<const MCInst *, 4> SetOfRelatedInsts;
 
 /// A state representing which registers are safe to use by an instruction
 /// at a given program point.
 ///
 /// To simplify reasoning, let's stick with the following approach:
-/// * when state is updated by the data-flow analysis, the sub-, super- and
+/// * when state is updated by the dataflow analysis, the sub-, super- and
 ///   overlapping registers are marked as needed
 /// * when the particular instruction is checked if it represents a gadget,
 ///   the specific bit of BitVector should be usable to answer this.
@@ -162,14 +164,20 @@ typedef SmallPtrSet<const MCInst *, 4> SetOfRelatedInsts;
 ///   X30 is safe-to-dereference - the state computed for sub- and
 ///   super-registers is not inspected.
 struct SrcState {
-  /// A BitVector containing the registers that are either authenticated
-  /// (assuming failed authentication is permitted to produce an invalid
-  /// address, provided it generates an error on memory access) or whose
-  /// value is known not to be attacker-controlled under Pointer Authentication
-  /// threat model. The registers in this set are either
+  /// A BitVector containing the registers that are either authenticated or
+  /// whose value is known not to be attacker-controlled under Pointer
+  /// Authentication threat model. If AuthTrapsOnFailure is false, a failed
+  /// authentication is permitted to produce an invalid address that generates
+  /// an error on memory access. The registers in this set are either
   /// * not clobbered since being authenticated, or
   /// * trusted at function entry and were not clobbered yet, or
   /// * contain a safely materialized address.
+  ///
+  /// Safe-to-dereference registers are considered to be safe to use by the
+  /// instructions that perform memory access and generate an error on failed
+  /// address translation. These registers are not generally safe to be used
+  /// by the instructions like pointer signing, as such usage may hide the
+  /// authentication failure.
   BitVector SafeToDerefRegs;
   /// A BitVector containing the registers that are either authenticated
   /// *successfully* or whose value is known not to be attacker-controlled
@@ -179,22 +187,30 @@ struct SrcState {
   ///   (and not clobbered since then), or
   /// * trusted at function entry and were not clobbered yet, or
   /// * contain a safely materialized address.
+  ///
+  /// When authentication instructions are assumed to always trap on error,
+  /// this is identical to SafeToDerefRegs.
   BitVector TrustedRegs;
-  /// A vector of sets, only used in the second data flow run.
+  /// A vector of sets, only used on the second analysis run.
   /// Each element in the vector represents one of the registers for which we
-  /// track the set of last instructions that wrote to this register. For
-  /// pac-ret analysis, the expectation is that almost all return instructions
-  /// only use register `X30`, and therefore, this vector will probably have
-  /// length 1 in the second run.
+  /// track the set of last instructions that wrote to this register, excluding
+  /// authentications. This is intended to provide best-effort clues on which
+  /// instruction caused the particular register not to be safe-to-dereference.
+  ///
+  /// Please note that the mapping from MCPhysReg values to indexes in this
+  /// vector is provided by RegsToTrack field of SrcSafetyAnalysis.
   std::vector<SetOfRelatedInsts> LastInstWritingReg;
 
-  /// Construct an empty state.
+  /// Constructs an empty state (no registers at all).
   SrcState() {}
 
+  /// Constructs a new state with all registers marked unsafe.
   SrcState(unsigned NumRegs, unsigned NumRegsToTrack)
       : SafeToDerefRegs(NumRegs), TrustedRegs(NumRegs),
         LastInstWritingReg(NumRegsToTrack) {}
 
+  /// Updates *this to account for the state incoming from a predecessor basic
+  /// block (i.e. computes the least safe states among *this and StateIn).
   SrcState &merge(const SrcState &StateIn) {
     if (StateIn.empty())
       return *this;
@@ -280,15 +296,15 @@ void SrcStatePrinter::print(raw_ostream &OS, const SrcState &S) const {
 /// version for functions without reconstructed CFG.
 class SrcSafetyAnalysis {
 public:
-  SrcSafetyAnalysis(BinaryFunction &BF, ArrayRef<MCPhysReg> RegsToTrackInstsFor)
+  SrcSafetyAnalysis(BinaryFunction &BF, ArrayRef<MCPhysReg> RegsToTrack)
       : BC(BF.getBinaryContext()), NumRegs(BC.MRI->getNumRegs()),
-        RegsToTrackInstsFor(RegsToTrackInstsFor) {}
+        RegsToTrack(RegsToTrack) {}
 
   virtual ~SrcSafetyAnalysis() {}
 
   static std::shared_ptr<SrcSafetyAnalysis>
   create(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId,
-         ArrayRef<MCPhysReg> RegsToTrackInstsFor);
+         ArrayRef<MCPhysReg> RegsToTrack);
 
   virtual void run() = 0;
   virtual const SrcState &getStateBefore(const MCInst &Inst) const = 0;
@@ -296,9 +312,11 @@ class SrcSafetyAnalysis {
 protected:
   BinaryContext &BC;
   const unsigned NumRegs;
-  /// RegToTrackInstsFor is the set of registers for which the dataflow analysis
-  /// must compute which the last set of instructions writing to it are.
-  const TrackedRegisters RegsToTrackInstsFor;
+
+  /// The set of registers for which the dataflow analysis must compute the set
+  /// of last writing instructions.
+  const TrackedRegisters RegsToTrack;
+
   /// Stores information about the detected instruction sequences emitted to
   /// check an authenticated pointer. Specifically, if such sequence is detected
   /// in a basic block, it maps the last instruction of that basic block to
@@ -312,17 +330,18 @@ class SrcSafetyAnalysis {
       CheckerSequenceInfo;
 
   SetOfRelatedInsts &lastWritingInsts(SrcState &S, MCPhysReg Reg) const {
-    unsigned Index = RegsToTrackInstsFor.getIndex(Reg);
+    unsigned Index = RegsToTrack.getIndex(Reg);
     return S.LastInstWritingReg[Index];
   }
   const SetOfRelatedInsts &lastWritingInsts(const SrcState &S,
                                             MCPhysReg Reg) const {
-    unsigned Index = RegsToTrackInstsFor.getIndex(Reg);
+    unsigned Index = RegsToTrack.getIndex(Reg);
     return S.LastInstWritingReg[Index];
   }
 
+  /// Computes SrcState observed on function entry.
   SrcState createEntryState() {
-    SrcState S(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
+    SrcState S(NumRegs, RegsToTrack.getNumRegisters());
     for (MCPhysReg Reg : BC.MIB->getTrustedLiveInRegs())
       S.TrustedRegs |= BC.MIB->getAliases(Reg, /*OnlySmaller=*/true);
     S.SafeToDerefRegs = S.TrustedRegs;
@@ -359,8 +378,6 @@ class SrcSafetyAnalysis {
     // saved on the stack at some point during execution of the callee.
     // Therefore they should also be considered as potentially modified by an
     // attacker/written to.
-    // Also, not all functions may respect the AAPCS ABI rules about
-    // caller/callee-saved registers.
     if (BC.MIB->isCall(Point))
       Clobbered.set();
     else
@@ -402,16 +419,16 @@ class SrcSafetyAnalysis {
                                                 const SrcState &Cur) const {
     SmallVector<MCPhysReg> Regs;
 
-    // A signed pointer can be authenticated, or
+    // A signed pointer can be authenticated, ...
     bool Dummy = false;
     if (auto AutReg = BC.MIB->getWrittenAuthenticatedReg(Point, Dummy))
       Regs.push_back(*AutReg);
 
-    // ... a safe address can be materialized, or
+    // ... or a safe address can be materialized, ...
     if (auto NewAddrReg = BC.MIB->getMaterializedAddressRegForPtrAuth(Point))
       Regs.push_back(*NewAddrReg);
 
-    // ... an address can be updated in a safe manner, producing the result
+    // ... or an address can be updated in a safe manner, producing the result
     // which is as trusted as the input address.
     if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) {
       auto [DstReg, SrcReg] = *DstAndSrc;
@@ -427,7 +444,7 @@ class SrcSafetyAnalysis {
     //    xpaclri         ; clobbers LR, LR is not safe anymore
     //    cmp   x30, x16
     //    b.eq  1f        ; end of the sequence: LR is marked as trusted
-    //    brk   0x1234
+    //    brk   0xc470
     //  1:
     //    ; at this point LR would be marked as trusted,
     //    ; but not safe-to-dereference
@@ -449,23 +466,23 @@ class SrcSafetyAnalysis {
     assert(!AuthTrapsOnFailure && "Use getRegsMadeSafeToDeref instead");
     SmallVector<MCPhysReg> Regs;
 
-    // An authenticated pointer can be checked, or
+    // An authenticated pointer can be checked, ...
     if (auto CheckedReg = getRegMadeTrustedByChecking(Point, Cur))
       Regs.push_back(*CheckedReg);
 
-    // ... a pointer can be authenticated by an instruction that always checks
-    // the pointer, or
+    // ... or a pointer can be authenticated by an instruction that always
+    // checks the pointer, ...
     bool IsChecked = false;
     std::optional<MCPhysReg> AutReg =
         BC.MIB->getWrittenAuthenticatedReg(Point, IsChecked);
     if (AutReg && IsChecked)
       Regs.push_back(*AutReg);
 
-    // ... a safe address can be materialized, or
+    // ... or a safe address can be materialized, ...
     if (auto NewAddrReg = BC.MIB->getMaterializedAddressRegForPtrAuth(Point))
       Regs.push_back(*NewAddrReg);
 
-    // ... an address can be updated in a safe manner, producing the result
+    // ... or an address can be updated in a safe manner, producing the result
     // which is as trusted as the input address.
     if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) {
       auto [DstReg, SrcReg] = *DstAndSrc;
@@ -489,9 +506,11 @@ class SrcSafetyAnalysis {
       dbgs() << ")\n";
     });
 
-    // If this instruction is reachable, a non-empty state will be propagated
-    // to it from the entry basic block sooner or later. Until then, it is both
-    // more efficient and easier to reason about to skip computeNext().
+    // Skip this instruction until a non-empty state is propagated here.
+    // When performing a dataflow analysis, it is technically possible that
+    // Cur is always empty at a given program point - then just keep it empty.
+    // For details, see DataflowSrcSafetyAnalysis::getStartingStateAtBB() and
+    // FunctionAnalysis::findUnsafeUses().
     if (Cur.empty()) {
       LLVM_DEBUG(
           { dbgs() << "Skipping computeNext(Point, Cur) as Cur is empty.\n"; });
@@ -517,7 +536,7 @@ class SrcSafetyAnalysis {
     Next.TrustedRegs.reset(Clobbered);
     // Keep track of this instruction if it writes to any of the registers we
     // need to track that for:
-    for (MCPhysReg Reg : RegsToTrackInstsFor.getRegisters())
+    for (MCPhysReg Reg : RegsToTrack.getRegisters())
       if (Clobbered[Reg])
         lastWritingInsts(Next, Reg) = {&Point};
 
@@ -531,7 +550,7 @@ class SrcSafetyAnalysis {
       NewSafeSubregs |= BC.MIB->getAliases(SafeReg, /*OnlySmaller=*/true);
     for (MCPhysReg Reg : NewSafeSubregs.set_bits()) {
       Next.SafeToDerefRegs.set(Reg);
-      if (RegsToTrackInstsFor.isTracked(Reg))
+      if (RegsToTrack.isTracked(Reg))
         lastWritingInsts(Next, Reg).clear();
     }
 
@@ -584,8 +603,8 @@ class DataflowSrcSafetyAnalysis
 public:
   DataflowSrcSafetyAnalysis(BinaryFunction &BF,
                             MCPlusBuilder::AllocatorIdTy AllocId,
-                            ArrayRef<MCPhysReg> RegsToTrackInstsFor)
-      : SrcSafetyAnalysis(BF, RegsToTrackInstsFor), DFParent(BF, AllocId) {}
+                            ArrayRef<MCPhysReg> RegsToTrack)
+      : SrcSafetyAnalysis(BF, RegsToTrack), DFParent(BF, AllocId) {}
 
   const SrcState &getStateBefore(const MCInst &Inst) const override {
     return DFParent::getStateBefore(Inst).get();
@@ -607,7 +626,7 @@ class DataflowSrcSafetyAnalysis
         (void)CheckedReg;
         (void)FirstInst;
         assert(llvm::any_of(BB, [&](MCInst &I) { return &I == &FirstInst; }) &&
-               "Data-flow analysis expects the checker not to cross BBs");
+               "Dataflow analysis expects the checker not to cross BBs");
         CheckerSequenceInfo[&LastInst] = *CheckerInfo;
       }
     }
@@ -725,12 +744,12 @@ template <typename StateTy> class CFGUnawareAnalysis {
 //     ret
 //   JTI0:
 //     .byte some_label - Ltmp0 ; computing offsets using labels may probably
-//                                work too, provided enough information is
-//                                retained by the assembler and linker
+//                                be detected too, provided enough information
+//                                is retained by the assembler and linker
 //
 // Then, a function can be split into a number of disjoint contiguous sequences
 // of instructions without labels in between. These sequences can be processed
-// the same way basic blocks are processed by data-flow analysis, with the same
+// the same way basic blocks are processed by dataflow analysis, with the same
 // pessimistic estimation of the initial state at the start of each sequence
 // (except the first instruction of the function).
 class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis,
@@ -741,8 +760,8 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis,
 public:
   CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF,
                               MCPlusBuilder::AllocatorIdTy AllocId,
-                              ArrayRef<MCPhysReg> RegsToTrackInstsFor)
-      : SrcSafetyAnalysis(BF, RegsToTrackInstsFor),
+                              ArrayRef<MCPhysReg> RegsToTrack)
+      : SrcSafetyAnalysis(BF, RegsToTrack),
         CFGUnawareAnalysis(BF, AllocId, "CFGUnawareSrcSafetyAnalysis"), BF(BF) {
   }
 
@@ -781,12 +800,12 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis,
 std::shared_ptr<SrcSafetyAnalysis>
 SrcSafetyAnalysis::create(BinaryFunction &BF,
                           MCPlusBuilder::AllocatorIdTy AllocId,
-                          ArrayRef<MCPhysReg> RegsToTrackInstsFor) {
+                          ArrayRef<MCPhysReg> RegsToTrack) {
   if (BF.hasCFG())
     return std::make_shared<DataflowSrcSafetyAnalysis>(BF, AllocId,
-                                                       RegsToTrackInstsFor);
+                                                       RegsToTrack);
   return std::make_shared<CFGUnawareSrcSafetyAnalysis>(BF, AllocId,
-                                                       RegsToTrackInstsFor);
+                                                       RegsToTrack);
 }
 
 /// A state representing which registers are safe to be used as the destination
@@ -849,15 +868,18 @@ struct DstState {
   /// provide clues on which instruction made the particular register unsafe.
   ///
   /// Please note that the mapping from MCPhysReg values to indexes in this
-  /// vector is provided by RegsToTrackInstsFor field of DstSafetyAnalysis.
+  /// vector is provided by RegsToTrack field of DstSafetyAnalysis.
   std::vector<SetOfRelatedInsts> FirstInstLeakingReg;
 
-  /// Constructs an empty state.
+  /// Constructs an empty state (no registers at all).
   DstState() {}
 
+  /// Constructs a new state with all registers marked unsafe.
   DstState(unsigned NumRegs, unsigned Nu...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/169801


More information about the llvm-commits mailing list