[llvm] [BOLT] Overhaul the comments in PAuthGadgetScanner for readability (NFC) (PR #169801)
via llvm-commits
llvm-commits at lists.llvm.org
Thu Nov 27 05:34:56 PST 2025
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-bolt
Author: Anatoly Trosinenko (atrosinenko)
<details>
<summary>Changes</summary>
Update the comments in PAuthGadgetScanner.cpp to better describe the
current version of the code. Along the way, shorten identifier names
that are redundant taking their context into account:
`RegsToTrackInstsFor` (made `RegsToTrack`) and `getNumTrackedRegisters`
(made `getNumRegisters`).
---
Patch is 29.38 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169801.diff
1 Files Affected:
- (modified) bolt/lib/Passes/PAuthGadgetScanner.cpp (+135-111)
``````````diff
diff --git a/bolt/lib/Passes/PAuthGadgetScanner.cpp b/bolt/lib/Passes/PAuthGadgetScanner.cpp
index 01b350b2f11fe..1594610113179 100644
--- a/bolt/lib/Passes/PAuthGadgetScanner.cpp
+++ b/bolt/lib/Passes/PAuthGadgetScanner.cpp
@@ -6,8 +6,40 @@
//
//===----------------------------------------------------------------------===//
//
-// This file implements a pass that looks for any AArch64 return instructions
-// that may not be protected by PAuth authentication instructions when needed.
+// This file implements a pass that looks for instructions which are expected
+// to be protected by Pointer Authentication, but the protection is missing
+// or insufficient. While the existing implementation only applies to AArch64,
+// it is intended to keep this file reasonably target-neutral, and place
+// AArch64-specific hooks in AArch64MCPlusBuilder.
+//
+// Various gadget kinds (patterns of unsafe instruction usage) can be detected.
+// Gadgets of the particular kind are detected by inspecting the susceptible
+// instructions (such as "all return instructions" or "all indirect branches
+// and calls") and validating properties of their operands. This is achieved
+// by first running a dataflow analysis (or its simplified counterpart, if CFG
+// information is not available for the particular function) to compute the
+// properties of registers before or after each instruction is executed and
+// then passing each instruction together with the computed state to a number
+// of gadget detectors that consume the results of this particular analysis.
+//
+// There are two broad groups of gadgets: those being detected by inspecting
+// the input operands of the susceptible instructions and those being detected
+// by analyzing their outputs. The former detectors consume SrcState computed
+// by iterating forwards over the instructions - normally by
+// DataflowSrcSafetyAnalysis (or if BOLT was unable to reconstruct CFG for the
+// function, by CFGUnawareSrcSafetyAnalysis). The latter consume DstState which
+// is computed in a similar way by iterating backwards over the instructions.
+//
+// Furthermore, when producing a diagnostic for a found gadget, this tool tries
+// to provide the clues on which instructions made the operands unsafe (such as
+// the set of last instructions that wrote an unsafe value to the register
+// along various possible paths of execution leading to this instruction).
+// This is achieved by re-running the same analysis for the second time to
+// collect the detailed information to improve the reports produced on the
+// first run. Since it is expected that most of the functions do not have any
+// issues to be reported, the second analysis run which is more time- and
+// memory-consuming is skipped for most functions. Please note that unlike
+// the reports themselves, these clues are provided on a best-effort basis.
//
//===----------------------------------------------------------------------===//
@@ -94,7 +126,7 @@ class TrackedRegisters {
ArrayRef<MCPhysReg> getRegisters() const { return Registers; }
- size_t getNumTrackedRegisters() const { return Registers.size(); }
+ size_t getNumRegisters() const { return Registers.size(); }
bool empty() const { return Registers.empty(); }
@@ -111,43 +143,13 @@ class TrackedRegisters {
}
};
-// The security property that is checked is:
-// When a register is used as the address to jump to in a return instruction,
-// that register must be safe-to-dereference. It must either
-// (a) be safe-to-dereference at function entry and never be changed within this
-// function, i.e. have the same value as when the function started, or
-// (b) the last write to the register must be by an authentication instruction.
-
-// This property is checked by using dataflow analysis to keep track of which
-// registers have been written (def-ed), since last authenticated. For pac-ret,
-// any return instruction using a register which is not safe-to-dereference is
-// a gadget to be reported. For PAuthABI, probably at least any indirect control
-// flow using such a register should be reported.
-
-// Furthermore, when producing a diagnostic for a found non-pac-ret protected
-// return, the analysis also lists the last instructions that wrote to the
-// register used in the return instruction.
-// The total set of registers used in return instructions in a given function is
-// small. It almost always is just `X30`.
-// In order to reduce the memory consumption of storing this additional state
-// during the dataflow analysis, this is computed by running the dataflow
-// analysis twice:
-// 1. In the first run, the dataflow analysis only keeps track of the security
-// property: i.e. which registers have been overwritten since the last
-// time they've been authenticated.
-// 2. If the first run finds any return instructions using a register last
-// written by a non-authenticating instruction, the dataflow analysis will
-// be run a second time. The first run will return which registers are used
-// in the gadgets to be reported. This information is used in the second run
-// to also track which instructions last wrote to those registers.
-
typedef SmallPtrSet<const MCInst *, 4> SetOfRelatedInsts;
/// A state representing which registers are safe to use by an instruction
/// at a given program point.
///
/// To simplify reasoning, let's stick with the following approach:
-/// * when state is updated by the data-flow analysis, the sub-, super- and
+/// * when state is updated by the dataflow analysis, the sub-, super- and
/// overlapping registers are marked as needed
/// * when the particular instruction is checked if it represents a gadget,
/// the specific bit of BitVector should be usable to answer this.
@@ -162,14 +164,20 @@ typedef SmallPtrSet<const MCInst *, 4> SetOfRelatedInsts;
/// X30 is safe-to-dereference - the state computed for sub- and
/// super-registers is not inspected.
struct SrcState {
- /// A BitVector containing the registers that are either authenticated
- /// (assuming failed authentication is permitted to produce an invalid
- /// address, provided it generates an error on memory access) or whose
- /// value is known not to be attacker-controlled under Pointer Authentication
- /// threat model. The registers in this set are either
+ /// A BitVector containing the registers that are either authenticated or
+ /// whose value is known not to be attacker-controlled under Pointer
+ /// Authentication threat model. If AuthTrapsOnFailure is false, a failed
+ /// authentication is permitted to produce an invalid address that generates
+ /// an error on memory access. The registers in this set are either
/// * not clobbered since being authenticated, or
/// * trusted at function entry and were not clobbered yet, or
/// * contain a safely materialized address.
+ ///
+ /// Safe-to-dereference registers are considered to be safe to use by the
+ /// instructions that perform memory access and generate an error on failed
+ /// address translation. These registers are not generally safe to be used
+ /// by the instructions like pointer signing, as such usage may hide the
+ /// authentication failure.
BitVector SafeToDerefRegs;
/// A BitVector containing the registers that are either authenticated
/// *successfully* or whose value is known not to be attacker-controlled
@@ -179,22 +187,30 @@ struct SrcState {
/// (and not clobbered since then), or
/// * trusted at function entry and were not clobbered yet, or
/// * contain a safely materialized address.
+ ///
+ /// When authentication instructions are assumed to always trap on error,
+ /// this is identical to SafeToDerefRegs.
BitVector TrustedRegs;
- /// A vector of sets, only used in the second data flow run.
+ /// A vector of sets, only used on the second analysis run.
/// Each element in the vector represents one of the registers for which we
- /// track the set of last instructions that wrote to this register. For
- /// pac-ret analysis, the expectation is that almost all return instructions
- /// only use register `X30`, and therefore, this vector will probably have
- /// length 1 in the second run.
+ /// track the set of last instructions that wrote to this register, excluding
+ /// authentications. This is intended to provide best-effort clues on which
+ /// instruction caused the particular register not to be safe-to-dereference.
+ ///
+ /// Please note that the mapping from MCPhysReg values to indexes in this
+ /// vector is provided by RegsToTrack field of SrcSafetyAnalysis.
std::vector<SetOfRelatedInsts> LastInstWritingReg;
- /// Construct an empty state.
+ /// Constructs an empty state (no registers at all).
SrcState() {}
+ /// Constructs a new state with all registers marked unsafe.
SrcState(unsigned NumRegs, unsigned NumRegsToTrack)
: SafeToDerefRegs(NumRegs), TrustedRegs(NumRegs),
LastInstWritingReg(NumRegsToTrack) {}
+ /// Updates *this to account for the state incoming from a predecessor basic
+ /// block (i.e. computes the least safe states among *this and StateIn).
SrcState &merge(const SrcState &StateIn) {
if (StateIn.empty())
return *this;
@@ -280,15 +296,15 @@ void SrcStatePrinter::print(raw_ostream &OS, const SrcState &S) const {
/// version for functions without reconstructed CFG.
class SrcSafetyAnalysis {
public:
- SrcSafetyAnalysis(BinaryFunction &BF, ArrayRef<MCPhysReg> RegsToTrackInstsFor)
+ SrcSafetyAnalysis(BinaryFunction &BF, ArrayRef<MCPhysReg> RegsToTrack)
: BC(BF.getBinaryContext()), NumRegs(BC.MRI->getNumRegs()),
- RegsToTrackInstsFor(RegsToTrackInstsFor) {}
+ RegsToTrack(RegsToTrack) {}
virtual ~SrcSafetyAnalysis() {}
static std::shared_ptr<SrcSafetyAnalysis>
create(BinaryFunction &BF, MCPlusBuilder::AllocatorIdTy AllocId,
- ArrayRef<MCPhysReg> RegsToTrackInstsFor);
+ ArrayRef<MCPhysReg> RegsToTrack);
virtual void run() = 0;
virtual const SrcState &getStateBefore(const MCInst &Inst) const = 0;
@@ -296,9 +312,11 @@ class SrcSafetyAnalysis {
protected:
BinaryContext &BC;
const unsigned NumRegs;
- /// RegToTrackInstsFor is the set of registers for which the dataflow analysis
- /// must compute which the last set of instructions writing to it are.
- const TrackedRegisters RegsToTrackInstsFor;
+
+ /// The set of registers for which the dataflow analysis must compute the set
+ /// of last writing instructions.
+ const TrackedRegisters RegsToTrack;
+
/// Stores information about the detected instruction sequences emitted to
/// check an authenticated pointer. Specifically, if such sequence is detected
/// in a basic block, it maps the last instruction of that basic block to
@@ -312,17 +330,18 @@ class SrcSafetyAnalysis {
CheckerSequenceInfo;
SetOfRelatedInsts &lastWritingInsts(SrcState &S, MCPhysReg Reg) const {
- unsigned Index = RegsToTrackInstsFor.getIndex(Reg);
+ unsigned Index = RegsToTrack.getIndex(Reg);
return S.LastInstWritingReg[Index];
}
const SetOfRelatedInsts &lastWritingInsts(const SrcState &S,
MCPhysReg Reg) const {
- unsigned Index = RegsToTrackInstsFor.getIndex(Reg);
+ unsigned Index = RegsToTrack.getIndex(Reg);
return S.LastInstWritingReg[Index];
}
+ /// Computes SrcState observed on function entry.
SrcState createEntryState() {
- SrcState S(NumRegs, RegsToTrackInstsFor.getNumTrackedRegisters());
+ SrcState S(NumRegs, RegsToTrack.getNumRegisters());
for (MCPhysReg Reg : BC.MIB->getTrustedLiveInRegs())
S.TrustedRegs |= BC.MIB->getAliases(Reg, /*OnlySmaller=*/true);
S.SafeToDerefRegs = S.TrustedRegs;
@@ -359,8 +378,6 @@ class SrcSafetyAnalysis {
// saved on the stack at some point during execution of the callee.
// Therefore they should also be considered as potentially modified by an
// attacker/written to.
- // Also, not all functions may respect the AAPCS ABI rules about
- // caller/callee-saved registers.
if (BC.MIB->isCall(Point))
Clobbered.set();
else
@@ -402,16 +419,16 @@ class SrcSafetyAnalysis {
const SrcState &Cur) const {
SmallVector<MCPhysReg> Regs;
- // A signed pointer can be authenticated, or
+ // A signed pointer can be authenticated, ...
bool Dummy = false;
if (auto AutReg = BC.MIB->getWrittenAuthenticatedReg(Point, Dummy))
Regs.push_back(*AutReg);
- // ... a safe address can be materialized, or
+ // ... or a safe address can be materialized, ...
if (auto NewAddrReg = BC.MIB->getMaterializedAddressRegForPtrAuth(Point))
Regs.push_back(*NewAddrReg);
- // ... an address can be updated in a safe manner, producing the result
+ // ... or an address can be updated in a safe manner, producing the result
// which is as trusted as the input address.
if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) {
auto [DstReg, SrcReg] = *DstAndSrc;
@@ -427,7 +444,7 @@ class SrcSafetyAnalysis {
// xpaclri ; clobbers LR, LR is not safe anymore
// cmp x30, x16
// b.eq 1f ; end of the sequence: LR is marked as trusted
- // brk 0x1234
+ // brk 0xc470
// 1:
// ; at this point LR would be marked as trusted,
// ; but not safe-to-dereference
@@ -449,23 +466,23 @@ class SrcSafetyAnalysis {
assert(!AuthTrapsOnFailure && "Use getRegsMadeSafeToDeref instead");
SmallVector<MCPhysReg> Regs;
- // An authenticated pointer can be checked, or
+ // An authenticated pointer can be checked, ...
if (auto CheckedReg = getRegMadeTrustedByChecking(Point, Cur))
Regs.push_back(*CheckedReg);
- // ... a pointer can be authenticated by an instruction that always checks
- // the pointer, or
+ // ... or a pointer can be authenticated by an instruction that always
+ // checks the pointer, ...
bool IsChecked = false;
std::optional<MCPhysReg> AutReg =
BC.MIB->getWrittenAuthenticatedReg(Point, IsChecked);
if (AutReg && IsChecked)
Regs.push_back(*AutReg);
- // ... a safe address can be materialized, or
+ // ... or a safe address can be materialized, ...
if (auto NewAddrReg = BC.MIB->getMaterializedAddressRegForPtrAuth(Point))
Regs.push_back(*NewAddrReg);
- // ... an address can be updated in a safe manner, producing the result
+ // ... or an address can be updated in a safe manner, producing the result
// which is as trusted as the input address.
if (auto DstAndSrc = BC.MIB->analyzeAddressArithmeticsForPtrAuth(Point)) {
auto [DstReg, SrcReg] = *DstAndSrc;
@@ -489,9 +506,11 @@ class SrcSafetyAnalysis {
dbgs() << ")\n";
});
- // If this instruction is reachable, a non-empty state will be propagated
- // to it from the entry basic block sooner or later. Until then, it is both
- // more efficient and easier to reason about to skip computeNext().
+ // Skip this instruction until a non-empty state is propagated here.
+ // When performing a dataflow analysis, it is technically possible that
+ // Cur is always empty at a given program point - then just keep it empty.
+ // For details, see DataflowSrcSafetyAnalysis::getStartingStateAtBB() and
+ // FunctionAnalysis::findUnsafeUses().
if (Cur.empty()) {
LLVM_DEBUG(
{ dbgs() << "Skipping computeNext(Point, Cur) as Cur is empty.\n"; });
@@ -517,7 +536,7 @@ class SrcSafetyAnalysis {
Next.TrustedRegs.reset(Clobbered);
// Keep track of this instruction if it writes to any of the registers we
// need to track that for:
- for (MCPhysReg Reg : RegsToTrackInstsFor.getRegisters())
+ for (MCPhysReg Reg : RegsToTrack.getRegisters())
if (Clobbered[Reg])
lastWritingInsts(Next, Reg) = {&Point};
@@ -531,7 +550,7 @@ class SrcSafetyAnalysis {
NewSafeSubregs |= BC.MIB->getAliases(SafeReg, /*OnlySmaller=*/true);
for (MCPhysReg Reg : NewSafeSubregs.set_bits()) {
Next.SafeToDerefRegs.set(Reg);
- if (RegsToTrackInstsFor.isTracked(Reg))
+ if (RegsToTrack.isTracked(Reg))
lastWritingInsts(Next, Reg).clear();
}
@@ -584,8 +603,8 @@ class DataflowSrcSafetyAnalysis
public:
DataflowSrcSafetyAnalysis(BinaryFunction &BF,
MCPlusBuilder::AllocatorIdTy AllocId,
- ArrayRef<MCPhysReg> RegsToTrackInstsFor)
- : SrcSafetyAnalysis(BF, RegsToTrackInstsFor), DFParent(BF, AllocId) {}
+ ArrayRef<MCPhysReg> RegsToTrack)
+ : SrcSafetyAnalysis(BF, RegsToTrack), DFParent(BF, AllocId) {}
const SrcState &getStateBefore(const MCInst &Inst) const override {
return DFParent::getStateBefore(Inst).get();
@@ -607,7 +626,7 @@ class DataflowSrcSafetyAnalysis
(void)CheckedReg;
(void)FirstInst;
assert(llvm::any_of(BB, [&](MCInst &I) { return &I == &FirstInst; }) &&
- "Data-flow analysis expects the checker not to cross BBs");
+ "Dataflow analysis expects the checker not to cross BBs");
CheckerSequenceInfo[&LastInst] = *CheckerInfo;
}
}
@@ -725,12 +744,12 @@ template <typename StateTy> class CFGUnawareAnalysis {
// ret
// JTI0:
// .byte some_label - Ltmp0 ; computing offsets using labels may probably
-// work too, provided enough information is
-// retained by the assembler and linker
+// be detected too, provided enough information
+// is retained by the assembler and linker
//
// Then, a function can be split into a number of disjoint contiguous sequences
// of instructions without labels in between. These sequences can be processed
-// the same way basic blocks are processed by data-flow analysis, with the same
+// the same way basic blocks are processed by dataflow analysis, with the same
// pessimistic estimation of the initial state at the start of each sequence
// (except the first instruction of the function).
class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis,
@@ -741,8 +760,8 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis,
public:
CFGUnawareSrcSafetyAnalysis(BinaryFunction &BF,
MCPlusBuilder::AllocatorIdTy AllocId,
- ArrayRef<MCPhysReg> RegsToTrackInstsFor)
- : SrcSafetyAnalysis(BF, RegsToTrackInstsFor),
+ ArrayRef<MCPhysReg> RegsToTrack)
+ : SrcSafetyAnalysis(BF, RegsToTrack),
CFGUnawareAnalysis(BF, AllocId, "CFGUnawareSrcSafetyAnalysis"), BF(BF) {
}
@@ -781,12 +800,12 @@ class CFGUnawareSrcSafetyAnalysis : public SrcSafetyAnalysis,
std::shared_ptr<SrcSafetyAnalysis>
SrcSafetyAnalysis::create(BinaryFunction &BF,
MCPlusBuilder::AllocatorIdTy AllocId,
- ArrayRef<MCPhysReg> RegsToTrackInstsFor) {
+ ArrayRef<MCPhysReg> RegsToTrack) {
if (BF.hasCFG())
return std::make_shared<DataflowSrcSafetyAnalysis>(BF, AllocId,
- RegsToTrackInstsFor);
+ RegsToTrack);
return std::make_shared<CFGUnawareSrcSafetyAnalysis>(BF, AllocId,
- RegsToTrackInstsFor);
+ RegsToTrack);
}
/// A state representing which registers are safe to be used as the destination
@@ -849,15 +868,18 @@ struct DstState {
/// provide clues on which instruction made the particular register unsafe.
///
/// Please note that the mapping from MCPhysReg values to indexes in this
- /// vector is provided by RegsToTrackInstsFor field of DstSafetyAnalysis.
+ /// vector is provided by RegsToTrack field of DstSafetyAnalysis.
std::vector<SetOfRelatedInsts> FirstInstLeakingReg;
- /// Constructs an empty state.
+ /// Constructs an empty state (no registers at all).
DstState() {}
+ /// Constructs a new state with all registers marked unsafe.
DstState(unsigned NumRegs, unsigned Nu...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/169801
More information about the llvm-commits
mailing list