[llvm] [BOLT][binary-analysis] Add initial pac-ret gadget scanner (PR #122304)

Kristof Beyls via llvm-commits llvm-commits at lists.llvm.org
Mon Jan 20 09:08:09 PST 2025


================
@@ -0,0 +1,543 @@
+//===- bolt/Passes/NonPacProtectedRetAnalysis.cpp -------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements a pass that looks for any AArch64 return instructions
+// that may not be protected by PAuth authentication instructions when needed.
+//
+//===----------------------------------------------------------------------===//
+
+#include "bolt/Passes/NonPacProtectedRetAnalysis.h"
+#include "bolt/Core/ParallelUtilities.h"
+#include "bolt/Passes/DataflowAnalysis.h"
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/MC/MCInst.h"
+#include "llvm/Support/Format.h"
+#include <memory>
+
+#define DEBUG_TYPE "bolt-nonpacprotectedret"
+
+namespace llvm {
+namespace bolt {
+
+raw_ostream &operator<<(raw_ostream &OS, const MCInstInBBReference &Ref) {
+  OS << "MCInstBBRef<";
+  if (Ref.BB == nullptr)
+    OS << "BB:(null)";
+  else
+    OS << "BB:" << Ref.BB->getName() << ":" << Ref.BBIndex;
+  OS << ">";
+  return OS;
+}
+
+raw_ostream &operator<<(raw_ostream &OS, const MCInstInBFReference &Ref) {
+  OS << "MCInstBFRef<";
+  if (Ref.BF == nullptr)
+    OS << "BF:(null)";
+  else
+    OS << "BF:" << Ref.BF->getPrintName() << ":" << Ref.getOffset();
+  OS << ">";
+  return OS;
+}
+
+raw_ostream &operator<<(raw_ostream &OS, const MCInstReference &Ref) {
+  switch (Ref.ParentKind) {
+  case MCInstReference::BasicBlockParent:
+    OS << Ref.U.BBRef;
+    return OS;
+  case MCInstReference::FunctionParent:
+    OS << Ref.U.BFRef;
+    return OS;
+  }
+  llvm_unreachable("");
+}
+
+raw_ostream &operator<<(raw_ostream &OS, const GeneralDiagnostic &Diag) {
+  OS << "diag<'" << Diag.Text << "'>";
+  return OS;
+}
+
+namespace NonPacProtectedRetAnalysis {
+
+raw_ostream &operator<<(raw_ostream &OS, const Gadget &G) {
+  OS << "pac-ret<Ret:" << G.RetInst << ", ";
+  OS << "gadget<Overwriting:[";
+  for (auto Ref : G.OverwritingRetRegInst)
+    OS << Ref << " ";
+  OS << "]>>";
+  return OS;
+}
+
+raw_ostream &operator<<(raw_ostream &OS, const GenDiag &Diag) {
+  OS << "pac-ret<Ret:" << Diag.RetInst << ", " << Diag.Diag << ">";
+  return OS;
+}
+
+raw_ostream &operator<<(raw_ostream &OS, const Annotation &Diag) {
+  OS << "pac-ret<Ret:" << Diag.RetInst << ">";
+  return OS;
+}
+
+void Annotation::print(raw_ostream &OS) const { OS << *this; }
+void Gadget::print(raw_ostream &OS) const { OS << *this; }
+void GenDiag::print(raw_ostream &OS) const { OS << *this; }
+
+// The security property that is checked is:
+// When a register is used as the address to jump to in a return instruction,
+// that register must either:
+// (a) never be changed within this function, i.e. have the same value as when
+//     the function started, or
+// (b) the last write to the register must be by an authentication instruction.
+
+// This property is checked by using dataflow analysis to keep track of which
+// registers have been written (def-ed), since last authenticated. Those are
+// exactly the registers containing values that should not be trusted (as they
+// could have changed since the last time they were authenticated). For pac-ret,
+// any return instruction using such a register is a gadget to be reported. For
+// PAuthABI, probably at least any indirect control flow using such a register
+// should be reported.
+
+// Furthermore, when producing a diagnostic for a found non-pac-ret protected
+// return, the analysis also lists the last instructions that wrote to the
+// register used in the return instruction.
+// The total set of registers used in return instructions in a given function is
+// small. It almost always is just `X30`.
+// In order to reduce the memory consumption of storing this additional state
+// during the dataflow analysis, this is computed by running the dataflow
+// analysis twice:
+// 1. In the first run, the dataflow analysis only keeps track of the security
+//    property: i.e. which registers have been overwritten since the last
+//    time they've been authenticated.
+// 2. If the first run finds any return instructions using a register last
+//    written by a non-authenticating instruction, the dataflow analysis will
+//    be run a second time. The first run will return which registers are used
+//    in the gadgets to be reported. This information is used in the second run
+//    to also track with instructions last wrote to those registers.
+
+struct State {
+  /// A BitVector containing the registers that have been clobbered, and
+  /// not authenticated.
+  BitVector NonAutClobRegs;
+  /// A vector of sets, only used in the second data flow run.
+  /// Each element in the vector represent one registers for which we
+  /// track the set of last instructions that wrote to this register.
+  /// For pac-ret analysis, the expectation is that almost all return
+  /// instructions only use register `X30`, and therefore, this vector
+  /// will probably have length 1 in the second run.
+  std::vector<SmallPtrSet<const MCInst *, 4>> LastInstWritingReg;
+  State() {}
+  State(uint16_t NumRegs, uint16_t NumRegsToTrack)
----------------
kbeyls wrote:

The value for NumRegs comes from [MCRegisterInfo::getNumRegs](https://llvm.org/doxygen/MCRegisterInfo_8h_source.html#l00414), which returns an `unsigned` type.
So yes, it fully makes sense to use unsigned here...

https://github.com/llvm/llvm-project/pull/122304


More information about the llvm-commits mailing list