[llvm] [AArch64] Only create called thunks when hardening against SLS (PR #97472)

Thu Jul 4 07:23:22 PDT 2024

https://github.com/atrosinenko updated https://github.com/llvm/llvm-project/pull/97472

>From 556d8a339fbff87557e17478e52de29ed7bd3b9a Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko <atrosinenko at accesssoftek.com>
Date: Thu, 4 Jul 2024 17:03:47 +0300
Subject: [PATCH 1/2] [CodeGen] Refactor and document ThunkInserter (#97468)

In preparation for supporting BLRA* instructions in SLS Hardening on
AArch64, refactor ThunkInserter class.

The main intention of this commit is to document the way to merge the
BLR-rewriting logic of the AArch64SLSHardening pass into the
SLSBLRThunkInserter class. This makes it possible to only call
createThunkFunction for the thunks that are actually referenced.
Ultimately, it will prevent SLSBLRThunkInserter from unconditionally
generating about 1800 thunk functions corresponding to every possible
combination of operands passed to BLRAA or BLRAB instructions.

This particular commit does not affect the generated machine code and
consists of the following changes:
* document the existing behavior of ThunkInserter class
* introduce ThunkInserterPass template class to get rid of mostly
identical boilerplate code in ARM, AArch64 and X86 implementations
* move the InsertedThunks parameter from `mayUseThunk` to `insertThunks`
method
---
 llvm/include/llvm/CodeGen/IndirectThunks.h    | 133 +++++++++++++++---
 .../Target/AArch64/AArch64SLSHardening.cpp    |  48 ++-----
 llvm/lib/Target/ARM/ARMSLSHardening.cpp       |  66 +++------
 llvm/lib/Target/X86/X86IndirectThunks.cpp     |  54 ++-----
 4 files changed, 159 insertions(+), 142 deletions(-)

diff --git a/llvm/include/llvm/CodeGen/IndirectThunks.h b/llvm/include/llvm/CodeGen/IndirectThunks.h
index 9b064ab788bf78..6c16b326fedd0f 100644
--- a/llvm/include/llvm/CodeGen/IndirectThunks.h
+++ b/llvm/include/llvm/CodeGen/IndirectThunks.h
@@ -1,4 +1,4 @@
-//===---- IndirectThunks.h - Indirect Thunk Base Class ----------*- C++ -*-===//
+//===---- IndirectThunks.h - Indirect thunk insertion helpers ---*- C++ -*-===//
 //
 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
 // See https://llvm.org/LICENSE.txt for license information.
@@ -7,7 +7,9 @@
 //===----------------------------------------------------------------------===//
 ///
 /// \file
-/// Contains a base class for Passes that inject an MI thunk.
+/// Contains a base ThunkInserter class that simplifies injection of MI thunks
+/// as well as a default implementation of MachineFunctionPass wrapping
+/// several `ThunkInserter`s for targets to extend.
 ///
 //===----------------------------------------------------------------------===//
 
@@ -15,26 +17,95 @@
 #define LLVM_CODEGEN_INDIRECTTHUNKS_H
 
 #include "llvm/CodeGen/MachineFunction.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
 #include "llvm/CodeGen/MachineModuleInfo.h"
 #include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/Module.h"
 
 namespace llvm {
 
+/// This class assists in inserting MI thunk functions into the module and
+/// rewriting the existing machine functions to call these thunks.
+///
+/// One of the common cases is implementing security mitigations that involve
+/// replacing some machine code patterns with calls to special thunk functions.
+///
+/// Inserting a module pass late in the codegen pipeline may increase memory
+/// usage, as it serializes the transformations and forces preceding passes to
+/// produce machine code for all functions before running the module pass.
+/// For that reason, ThunkInserter can be driven by a MachineFunctionPass by
+/// passing one MachineFunction at a time to its `run(MMI, MF)` method.
+/// Then, the derived class should
+/// * call createThunkFunction from its insertThunks method exactly once for
+///   each of the thunk functions to be inserted
+/// * populate the thunk in its populateThunk method
+///
+/// Note that if some other pass is responsible for rewriting the functions,
+/// the insertThunks method may simply create all possible thunks at once,
+/// probably postponed until the first occurrence of possibly affected MF.
+///
+/// Alternatively, insertThunks method can rewrite MF by itself and only insert
+/// the thunks being called. In that case InsertedThunks variable can be used
+/// to track which thunks were already inserted.
+///
+/// In any case, the thunk function has to be inserted on behalf of some other
+/// function and then populated on its own "iteration" later - this is because
+/// MachineFunctionPass will see the newly created functions, but they first
+/// have to go through the preceding passes from the same pass manager,
+/// possibly even through the instruction selector.
+//
+// FIXME Maybe implement a documented and less surprising way of modifying
+//       the module from a MachineFunctionPass that is restricted to inserting
+//       completely new functions to the module.
 template <typename Derived, typename InsertedThunksTy = bool>
 class ThunkInserter {
   Derived &getDerived() { return *static_cast<Derived *>(this); }
 
-protected:
   // A variable used to track whether (and possible which) thunks have been
   // inserted so far. InsertedThunksTy is usually a bool, but can be other types
   // to represent more than one type of thunk. Requires an |= operator to
   // accumulate results.
   InsertedThunksTy InsertedThunks;
-  void doInitialization(Module &M) {}
+
+protected:
+  // Interface for subclasses to use.
+
+  /// Create an empty thunk function.
+  ///
+  /// The new function will eventually be passed to populateThunk. If multiple
+  /// thunks are created, populateThunk can distinguish them by their names.
   void createThunkFunction(MachineModuleInfo &MMI, StringRef Name,
                            bool Comdat = true, StringRef TargetAttrs = "");
 
+protected:
+  // Interface for subclasses to implement.
+  //
+  // Note: all functions are non-virtual and are called via getDerived().
+  // Note: only doInitialization() has an implementation.
+
+  /// Initializes thunk inserter.
+  void doInitialization(Module &M) {}
+
+  /// Returns common prefix for thunk function's names.
+  const char *getThunkPrefix(); // undefined
+
+  /// Checks if MF may use thunks (true - maybe, false - definitely not).
+  bool mayUseThunk(const MachineFunction &MF); // undefined
+
+  /// Rewrites the function if necessary, returns the set of thunks added.
+  InsertedThunksTy insertThunks(MachineModuleInfo &MMI, MachineFunction &MF,
+                                InsertedThunksTy ExistingThunks); // undefined
+
+  /// Populate the thunk function with instructions.
+  ///
+  /// If multiple thunks are created, the content that must be inserted in the
+  /// thunk function body should be derived from the MF's name.
+  ///
+  /// Depending on the preceding passes in the pass manager, by the time
+  /// populateThunk is called, MF may have a few target-specific instructions
+  /// (such as a single MBB containing the return instruction).
+  void populateThunk(MachineFunction &MF); // undefined
+
 public:
   void init(Module &M) {
     InsertedThunks = InsertedThunksTy{};
@@ -53,7 +124,7 @@ void ThunkInserter<Derived, InsertedThunksTy>::createThunkFunction(
 
   Module &M = const_cast<Module &>(*MMI.getModule());
   LLVMContext &Ctx = M.getContext();
-  auto Type = FunctionType::get(Type::getVoidTy(Ctx), false);
+  auto *Type = FunctionType::get(Type::getVoidTy(Ctx), false);
   Function *F = Function::Create(Type,
                                  Comdat ? GlobalValue::LinkOnceODRLinkage
                                         : GlobalValue::InternalLinkage,
@@ -95,19 +166,15 @@ bool ThunkInserter<Derived, InsertedThunksTy>::run(MachineModuleInfo &MMI,
                                                    MachineFunction &MF) {
   // If MF is not a thunk, check to see if we need to insert a thunk.
   if (!MF.getName().starts_with(getDerived().getThunkPrefix())) {
-    // Only add a thunk if one of the functions has the corresponding feature
-    // enabled in its subtarget, and doesn't enable external thunks. The target
-    // can use InsertedThunks to detect whether relevant thunks have already
-    // been inserted.
-    // FIXME: Conditionalize on indirect calls so we don't emit a thunk when
-    // nothing will end up calling it.
-    // FIXME: It's a little silly to look at every function just to enumerate
-    // the subtargets, but eventually we'll want to look at them for indirect
-    // calls, so maybe this is OK.
-    if (!getDerived().mayUseThunk(MF, InsertedThunks))
+    // Only add thunks if one of the functions may use them.
+    if (!getDerived().mayUseThunk(MF))
       return false;
 
-    InsertedThunks |= getDerived().insertThunks(MMI, MF);
+    // The target can use InsertedThunks to detect whether relevant thunks
+    // have already been inserted.
+    // FIXME: Provide the way for insertThunks to notify us whether it changed
+    //        the MF, instead of conservatively assuming it did.
+    InsertedThunks |= getDerived().insertThunks(MMI, MF, InsertedThunks);
     return true;
   }
 
@@ -116,6 +183,40 @@ bool ThunkInserter<Derived, InsertedThunksTy>::run(MachineModuleInfo &MMI,
   return true;
 }
 
+/// Basic implementation of MachineFunctionPass wrapping one or more
+/// `ThunkInserter`s passed as type parameters.
+template <typename... Inserters>
+class ThunkInserterPass : public MachineFunctionPass {
+protected:
+  std::tuple<Inserters...> TIs;
+
+  ThunkInserterPass(char &ID) : MachineFunctionPass(ID) {}
+
+public:
+  bool doInitialization(Module &M) override {
+    initTIs(M, TIs);
+    return false;
+  }
+
+  bool runOnMachineFunction(MachineFunction &MF) override {
+    auto &MMI = getAnalysis<MachineModuleInfoWrapperPass>().getMMI();
+    return runTIs(MMI, MF, TIs);
+  }
+
+private:
+  template <typename... ThunkInserterT>
+  static void initTIs(Module &M,
+                      std::tuple<ThunkInserterT...> &ThunkInserters) {
+    (..., std::get<ThunkInserterT>(ThunkInserters).init(M));
+  }
+
+  template <typename... ThunkInserterT>
+  static bool runTIs(MachineModuleInfo &MMI, MachineFunction &MF,
+                     std::tuple<ThunkInserterT...> &ThunkInserters) {
+    return (0 | ... | std::get<ThunkInserterT>(ThunkInserters).run(MMI, MF));
+  }
+};
+
 } // namespace llvm
 
 #endif
diff --git a/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp b/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp
index 41bbc003fd9bf7..7660de5c082d12 100644
--- a/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp
+++ b/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp
@@ -183,15 +183,12 @@ static const struct ThunkNameAndReg {
 namespace {
 struct SLSBLRThunkInserter : ThunkInserter<SLSBLRThunkInserter> {
   const char *getThunkPrefix() { return SLSBLRNamePrefix; }
-  bool mayUseThunk(const MachineFunction &MF, bool InsertedThunks) {
-    if (InsertedThunks)
-      return false;
+  bool mayUseThunk(const MachineFunction &MF) {
     ComdatThunks &= !MF.getSubtarget<AArch64Subtarget>().hardenSlsNoComdat();
-    // FIXME: This could also check if there are any BLRs in the function
-    // to more accurately reflect if a thunk will be needed.
     return MF.getSubtarget<AArch64Subtarget>().hardenSlsBlr();
   }
-  bool insertThunks(MachineModuleInfo &MMI, MachineFunction &MF);
+  bool insertThunks(MachineModuleInfo &MMI, MachineFunction &MF,
+                    bool ExistingThunks);
   void populateThunk(MachineFunction &MF);
 
 private:
@@ -200,7 +197,10 @@ struct SLSBLRThunkInserter : ThunkInserter<SLSBLRThunkInserter> {
 } // namespace
 
 bool SLSBLRThunkInserter::insertThunks(MachineModuleInfo &MMI,
-                                       MachineFunction &MF) {
+                                       MachineFunction &MF,
+                                       bool ExistingThunks) {
+  if (ExistingThunks)
+    return false;
   // FIXME: It probably would be possible to filter which thunks to produce
   // based on which registers are actually used in BLR instructions in this
   // function. But would that be a worthwhile optimization?
@@ -210,6 +210,8 @@ bool SLSBLRThunkInserter::insertThunks(MachineModuleInfo &MMI,
 }
 
 void SLSBLRThunkInserter::populateThunk(MachineFunction &MF) {
+  assert(MF.getFunction().hasComdat() == ComdatThunks &&
+         "ComdatThunks value changed since MF creation");
   // FIXME: How to better communicate Register number, rather than through
   // name and lookup table?
   assert(MF.getName().starts_with(getThunkPrefix()));
@@ -411,30 +413,13 @@ FunctionPass *llvm::createAArch64SLSHardeningPass() {
 }
 
 namespace {
-class AArch64IndirectThunks : public MachineFunctionPass {
+class AArch64IndirectThunks : public ThunkInserterPass<SLSBLRThunkInserter> {
 public:
   static char ID;
 
-  AArch64IndirectThunks() : MachineFunctionPass(ID) {}
+  AArch64IndirectThunks() : ThunkInserterPass(ID) {}
 
   StringRef getPassName() const override { return "AArch64 Indirect Thunks"; }
-
-  bool doInitialization(Module &M) override;
-  bool runOnMachineFunction(MachineFunction &MF) override;
-
-private:
-  std::tuple<SLSBLRThunkInserter> TIs;
-
-  template <typename... ThunkInserterT>
-  static void initTIs(Module &M,
-                      std::tuple<ThunkInserterT...> &ThunkInserters) {
-    (..., std::get<ThunkInserterT>(ThunkInserters).init(M));
-  }
-  template <typename... ThunkInserterT>
-  static bool runTIs(MachineModuleInfo &MMI, MachineFunction &MF,
-                     std::tuple<ThunkInserterT...> &ThunkInserters) {
-    return (0 | ... | std::get<ThunkInserterT>(ThunkInserters).run(MMI, MF));
-  }
 };
 
 } // end anonymous namespace
@@ -444,14 +429,3 @@ char AArch64IndirectThunks::ID = 0;
 FunctionPass *llvm::createAArch64IndirectThunks() {
   return new AArch64IndirectThunks();
 }
-
-bool AArch64IndirectThunks::doInitialization(Module &M) {
-  initTIs(M, TIs);
-  return false;
-}
-
-bool AArch64IndirectThunks::runOnMachineFunction(MachineFunction &MF) {
-  LLVM_DEBUG(dbgs() << getPassName() << '\n');
-  auto &MMI = getAnalysis<MachineModuleInfoWrapperPass>().getMMI();
-  return runTIs(MMI, MF, TIs);
-}
diff --git a/llvm/lib/Target/ARM/ARMSLSHardening.cpp b/llvm/lib/Target/ARM/ARMSLSHardening.cpp
index d9ff14ead60e27..d77db17090feb5 100644
--- a/llvm/lib/Target/ARM/ARMSLSHardening.cpp
+++ b/llvm/lib/Target/ARM/ARMSLSHardening.cpp
@@ -163,7 +163,7 @@ static const struct ThunkNameRegMode {
 
 // An enum for tracking whether Arm and Thumb thunks have been inserted into the
 // current module so far.
-enum ArmInsertedThunks { ArmThunk = 1, ThumbThunk = 2 };
+enum ArmInsertedThunks { NoThunk = 0, ArmThunk = 1, ThumbThunk = 2 };
 
 inline ArmInsertedThunks &operator|=(ArmInsertedThunks &X,
                                      ArmInsertedThunks Y) {
@@ -174,19 +174,12 @@ namespace {
 struct SLSBLRThunkInserter
     : ThunkInserter<SLSBLRThunkInserter, ArmInsertedThunks> {
   const char *getThunkPrefix() { return SLSBLRNamePrefix; }
-  bool mayUseThunk(const MachineFunction &MF,
-                   ArmInsertedThunks InsertedThunks) {
-    if ((InsertedThunks & ArmThunk &&
-         !MF.getSubtarget<ARMSubtarget>().isThumb()) ||
-        (InsertedThunks & ThumbThunk &&
-         MF.getSubtarget<ARMSubtarget>().isThumb()))
-      return false;
+  bool mayUseThunk(const MachineFunction &MF) {
     ComdatThunks &= !MF.getSubtarget<ARMSubtarget>().hardenSlsNoComdat();
-    // FIXME: This could also check if there are any indirect calls in the
-    // function to more accurately reflect if a thunk will be needed.
     return MF.getSubtarget<ARMSubtarget>().hardenSlsBlr();
   }
-  ArmInsertedThunks insertThunks(MachineModuleInfo &MMI, MachineFunction &MF);
+  ArmInsertedThunks insertThunks(MachineModuleInfo &MMI, MachineFunction &MF,
+                                 ArmInsertedThunks InsertedThunks);
   void populateThunk(MachineFunction &MF);
 
 private:
@@ -194,8 +187,14 @@ struct SLSBLRThunkInserter
 };
 } // namespace
 
-ArmInsertedThunks SLSBLRThunkInserter::insertThunks(MachineModuleInfo &MMI,
-                                                    MachineFunction &MF) {
+ArmInsertedThunks
+SLSBLRThunkInserter::insertThunks(MachineModuleInfo &MMI, MachineFunction &MF,
+                                  ArmInsertedThunks InsertedThunks) {
+  if ((InsertedThunks & ArmThunk &&
+       !MF.getSubtarget<ARMSubtarget>().isThumb()) ||
+      (InsertedThunks & ThumbThunk &&
+       MF.getSubtarget<ARMSubtarget>().isThumb()))
+    return NoThunk;
   // FIXME: It probably would be possible to filter which thunks to produce
   // based on which registers are actually used in indirect calls in this
   // function. But would that be a worthwhile optimization?
@@ -208,6 +207,8 @@ ArmInsertedThunks SLSBLRThunkInserter::insertThunks(MachineModuleInfo &MMI,
 }
 
 void SLSBLRThunkInserter::populateThunk(MachineFunction &MF) {
+  assert(MF.getFunction().hasComdat() == ComdatThunks &&
+         "ComdatThunks value changed since MF creation");
   // FIXME: How to better communicate Register number, rather than through
   // name and lookup table?
   assert(MF.getName().starts_with(getThunkPrefix()));
@@ -384,38 +385,14 @@ FunctionPass *llvm::createARMSLSHardeningPass() {
 }
 
 namespace {
-class ARMIndirectThunks : public MachineFunctionPass {
+class ARMIndirectThunks : public ThunkInserterPass<SLSBLRThunkInserter> {
 public:
   static char ID;
 
-  ARMIndirectThunks() : MachineFunctionPass(ID) {}
+  ARMIndirectThunks() : ThunkInserterPass(ID) {}
 
   StringRef getPassName() const override { return "ARM Indirect Thunks"; }
-
-  bool doInitialization(Module &M) override;
-  bool runOnMachineFunction(MachineFunction &MF) override;
-
-  void getAnalysisUsage(AnalysisUsage &AU) const override {
-    MachineFunctionPass::getAnalysisUsage(AU);
-    AU.addRequired<MachineModuleInfoWrapperPass>();
-    AU.addPreserved<MachineModuleInfoWrapperPass>();
-  }
-
-private:
-  std::tuple<SLSBLRThunkInserter> TIs;
-
-  template <typename... ThunkInserterT>
-  static void initTIs(Module &M,
-                      std::tuple<ThunkInserterT...> &ThunkInserters) {
-    (..., std::get<ThunkInserterT>(ThunkInserters).init(M));
-  }
-  template <typename... ThunkInserterT>
-  static bool runTIs(MachineModuleInfo &MMI, MachineFunction &MF,
-                     std::tuple<ThunkInserterT...> &ThunkInserters) {
-    return (0 | ... | std::get<ThunkInserterT>(ThunkInserters).run(MMI, MF));
-  }
 };
-
 } // end anonymous namespace
 
 char ARMIndirectThunks::ID = 0;
@@ -423,14 +400,3 @@ char ARMIndirectThunks::ID = 0;
 FunctionPass *llvm::createARMIndirectThunks() {
   return new ARMIndirectThunks();
 }
-
-bool ARMIndirectThunks::doInitialization(Module &M) {
-  initTIs(M, TIs);
-  return false;
-}
-
-bool ARMIndirectThunks::runOnMachineFunction(MachineFunction &MF) {
-  LLVM_DEBUG(dbgs() << getPassName() << '\n');
-  auto &MMI = getAnalysis<MachineModuleInfoWrapperPass>().getMMI();
-  return runTIs(MMI, MF, TIs);
-}
diff --git a/llvm/lib/Target/X86/X86IndirectThunks.cpp b/llvm/lib/Target/X86/X86IndirectThunks.cpp
index ecc52600f75933..4f4a8d8bd09d51 100644
--- a/llvm/lib/Target/X86/X86IndirectThunks.cpp
+++ b/llvm/lib/Target/X86/X86IndirectThunks.cpp
@@ -61,26 +61,26 @@ static const char R11LVIThunkName[] = "__llvm_lvi_thunk_r11";
 namespace {
 struct RetpolineThunkInserter : ThunkInserter<RetpolineThunkInserter> {
   const char *getThunkPrefix() { return RetpolineNamePrefix; }
-  bool mayUseThunk(const MachineFunction &MF, bool InsertedThunks) {
-    if (InsertedThunks)
-      return false;
+  bool mayUseThunk(const MachineFunction &MF) {
     const auto &STI = MF.getSubtarget<X86Subtarget>();
     return (STI.useRetpolineIndirectCalls() ||
             STI.useRetpolineIndirectBranches()) &&
            !STI.useRetpolineExternalThunk();
   }
-  bool insertThunks(MachineModuleInfo &MMI, MachineFunction &MF);
+  bool insertThunks(MachineModuleInfo &MMI, MachineFunction &MF,
+                    bool ExistingThunks);
   void populateThunk(MachineFunction &MF);
 };
 
 struct LVIThunkInserter : ThunkInserter<LVIThunkInserter> {
   const char *getThunkPrefix() { return LVIThunkNamePrefix; }
-  bool mayUseThunk(const MachineFunction &MF, bool InsertedThunks) {
-    if (InsertedThunks)
-      return false;
+  bool mayUseThunk(const MachineFunction &MF) {
     return MF.getSubtarget<X86Subtarget>().useLVIControlFlowIntegrity();
   }
-  bool insertThunks(MachineModuleInfo &MMI, MachineFunction &MF) {
+  bool insertThunks(MachineModuleInfo &MMI, MachineFunction &MF,
+                    bool ExistingThunks) {
+    if (ExistingThunks)
+      return false;
     createThunkFunction(MMI, R11LVIThunkName);
     return true;
   }
@@ -104,36 +104,23 @@ struct LVIThunkInserter : ThunkInserter<LVIThunkInserter> {
   }
 };
 
-class X86IndirectThunks : public MachineFunctionPass {
+class X86IndirectThunks
+    : public ThunkInserterPass<RetpolineThunkInserter, LVIThunkInserter> {
 public:
   static char ID;
 
-  X86IndirectThunks() : MachineFunctionPass(ID) {}
+  X86IndirectThunks() : ThunkInserterPass(ID) {}
 
   StringRef getPassName() const override { return "X86 Indirect Thunks"; }
-
-  bool doInitialization(Module &M) override;
-  bool runOnMachineFunction(MachineFunction &MF) override;
-
-private:
-  std::tuple<RetpolineThunkInserter, LVIThunkInserter> TIs;
-
-  template <typename... ThunkInserterT>
-  static void initTIs(Module &M,
-                      std::tuple<ThunkInserterT...> &ThunkInserters) {
-    (..., std::get<ThunkInserterT>(ThunkInserters).init(M));
-  }
-  template <typename... ThunkInserterT>
-  static bool runTIs(MachineModuleInfo &MMI, MachineFunction &MF,
-                     std::tuple<ThunkInserterT...> &ThunkInserters) {
-    return (0 | ... | std::get<ThunkInserterT>(ThunkInserters).run(MMI, MF));
-  }
 };
 
 } // end anonymous namespace
 
 bool RetpolineThunkInserter::insertThunks(MachineModuleInfo &MMI,
-                                          MachineFunction &MF) {
+                                          MachineFunction &MF,
+                                          bool ExistingThunks) {
+  if (ExistingThunks)
+    return false;
   if (MMI.getTarget().getTargetTriple().getArch() == Triple::x86_64)
     createThunkFunction(MMI, R11RetpolineName);
   else
@@ -259,14 +246,3 @@ FunctionPass *llvm::createX86IndirectThunksPass() {
 }
 
 char X86IndirectThunks::ID = 0;
-
-bool X86IndirectThunks::doInitialization(Module &M) {
-  initTIs(M, TIs);
-  return false;
-}
-
-bool X86IndirectThunks::runOnMachineFunction(MachineFunction &MF) {
-  LLVM_DEBUG(dbgs() << getPassName() << '\n');
-  auto &MMI = getAnalysis<MachineModuleInfoWrapperPass>().getMMI();
-  return runTIs(MMI, MF, TIs);
-}

>From 29d6b4419ab0efb6285a07b471f026f2b22686ae Mon Sep 17 00:00:00 2001
From: Anatoly Trosinenko <atrosinenko at accesssoftek.com>
Date: Fri, 28 Jun 2024 21:50:24 +0300
Subject: [PATCH 2/2] [AArch64] Only create called thunks when hardening
 against SLS

In preparation for implementing hardening of BLRA* instructions,
restrict thunk function generation to only the thunks being actually
called from any function. As described in the existing comments,
emitting all possible thunks for BLRAA and BLRAB instructions would
mean adding about 1800 functions in total, most of which are likely
not to be called.

This commit merges AArch64SLSHardening class into SLSBLRThunkInserter,
so thunks can be created as needed while rewriting a machine function.
The usages of TII, TRI and ST fields of AArch64SLSHardening class are
replaced with requesting them in-place, as ThunkInserter assumes
multiple "entry points" in contrast to the only runOnMachineFunction
method of AArch64SLSHardening.

The runOnMachineFunction method essentially replaces pre-existing
insertThunks implementation as there is no more need to insert all
possible thunks unconditionally. Instead, thunks are created on first
use from inside of insertThunks method.
---
 llvm/lib/Target/AArch64/AArch64.h             |   1 -
 .../Target/AArch64/AArch64SLSHardening.cpp    | 188 +++++++-----------
 .../Target/AArch64/AArch64TargetMachine.cpp   |   1 -
 llvm/test/CodeGen/AArch64/O0-pipeline.ll      |   1 -
 llvm/test/CodeGen/AArch64/O3-pipeline.ll      |   1 -
 .../AArch64/arm64-opt-remarks-lazy-bfi.ll     |   8 -
 .../speculation-hardening-sls-blr-bti.mir     |  20 --
 .../AArch64/speculation-hardening-sls-blr.mir |  35 ++--
 8 files changed, 89 insertions(+), 166 deletions(-)

diff --git a/llvm/lib/Target/AArch64/AArch64.h b/llvm/lib/Target/AArch64/AArch64.h
index 6f2aeb83a451af..66ad701d839585 100644
--- a/llvm/lib/Target/AArch64/AArch64.h
+++ b/llvm/lib/Target/AArch64/AArch64.h
@@ -40,7 +40,6 @@ FunctionPass *createAArch64ISelDag(AArch64TargetMachine &TM,
 FunctionPass *createAArch64StorePairSuppressPass();
 FunctionPass *createAArch64ExpandPseudoPass();
 FunctionPass *createAArch64SLSHardeningPass();
-FunctionPass *createAArch64IndirectThunks();
 FunctionPass *createAArch64SpeculationHardeningPass();
 FunctionPass *createAArch64LoadStoreOptimizationPass();
 ModulePass *createAArch64LowerHomogeneousPrologEpilogPass();
diff --git a/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp b/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp
index 7660de5c082d12..24f023a3d70e7b 100644
--- a/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp
+++ b/llvm/lib/Target/AArch64/AArch64SLSHardening.cpp
@@ -13,20 +13,16 @@
 
 #include "AArch64InstrInfo.h"
 #include "AArch64Subtarget.h"
-#include "Utils/AArch64BaseInfo.h"
 #include "llvm/CodeGen/IndirectThunks.h"
 #include "llvm/CodeGen/MachineBasicBlock.h"
 #include "llvm/CodeGen/MachineFunction.h"
-#include "llvm/CodeGen/MachineFunctionPass.h"
 #include "llvm/CodeGen/MachineInstr.h"
 #include "llvm/CodeGen/MachineInstrBuilder.h"
 #include "llvm/CodeGen/MachineOperand.h"
-#include "llvm/CodeGen/MachineRegisterInfo.h"
 #include "llvm/CodeGen/RegisterScavenging.h"
 #include "llvm/IR/DebugLoc.h"
 #include "llvm/Pass.h"
-#include "llvm/Support/CodeGen.h"
-#include "llvm/Support/Debug.h"
+#include "llvm/Support/ErrorHandling.h"
 #include "llvm/Target/TargetMachine.h"
 #include <cassert>
 
@@ -36,38 +32,42 @@ using namespace llvm;
 
 #define AARCH64_SLS_HARDENING_NAME "AArch64 sls hardening pass"
 
+static const char SLSBLRNamePrefix[] = "__llvm_slsblr_thunk_";
+
 namespace {
 
-class AArch64SLSHardening : public MachineFunctionPass {
-public:
-  const TargetInstrInfo *TII;
-  const TargetRegisterInfo *TRI;
-  const AArch64Subtarget *ST;
+// Set of inserted thunks: bitmask with bits corresponding to
+// indexes in SLSBLRThunks array.
+typedef uint32_t ThunksSet;
 
-  static char ID;
-
-  AArch64SLSHardening() : MachineFunctionPass(ID) {
-    initializeAArch64SLSHardeningPass(*PassRegistry::getPassRegistry());
+struct SLSHardeningInserter : ThunkInserter<SLSHardeningInserter, ThunksSet> {
+public:
+  const char *getThunkPrefix() { return SLSBLRNamePrefix; }
+  bool mayUseThunk(const MachineFunction &MF) {
+    ComdatThunks &= !MF.getSubtarget<AArch64Subtarget>().hardenSlsNoComdat();
+    // We are inserting barriers aside from thunk calls, so
+    // check hardenSlsRetBr() as well.
+    return MF.getSubtarget<AArch64Subtarget>().hardenSlsBlr() ||
+           MF.getSubtarget<AArch64Subtarget>().hardenSlsRetBr();
   }
+  ThunksSet insertThunks(MachineModuleInfo &MMI, MachineFunction &MF,
+                         ThunksSet ExistingThunks);
+  void populateThunk(MachineFunction &MF);
 
-  bool runOnMachineFunction(MachineFunction &Fn) override;
+private:
+  bool ComdatThunks = true;
 
-  StringRef getPassName() const override { return AARCH64_SLS_HARDENING_NAME; }
+  bool hardenReturnsAndBRs(MachineModuleInfo &MMI, MachineBasicBlock &MBB);
+  bool hardenBLRs(MachineModuleInfo &MMI, MachineBasicBlock &MBB,
+                  ThunksSet &Thunks);
 
-private:
-  bool hardenReturnsAndBRs(MachineBasicBlock &MBB) const;
-  bool hardenBLRs(MachineBasicBlock &MBB) const;
-  MachineBasicBlock &ConvertBLRToBL(MachineBasicBlock &MBB,
-                                    MachineBasicBlock::instr_iterator) const;
+  void convertBLRToBL(MachineModuleInfo &MMI, MachineBasicBlock &MBB,
+                      MachineBasicBlock::instr_iterator MBBI,
+                      ThunksSet &Thunks);
 };
 
 } // end anonymous namespace
 
-char AArch64SLSHardening::ID = 0;
-
-INITIALIZE_PASS(AArch64SLSHardening, "aarch64-sls-hardening",
-                AARCH64_SLS_HARDENING_NAME, false, false)
-
 static void insertSpeculationBarrier(const AArch64Subtarget *ST,
                                      MachineBasicBlock &MBB,
                                      MachineBasicBlock::iterator MBBI,
@@ -90,18 +90,18 @@ static void insertSpeculationBarrier(const AArch64Subtarget *ST,
     BuildMI(MBB, MBBI, DL, TII->get(BarrierOpc));
 }
 
-bool AArch64SLSHardening::runOnMachineFunction(MachineFunction &MF) {
-  ST = &MF.getSubtarget<AArch64Subtarget>();
-  TII = MF.getSubtarget().getInstrInfo();
-  TRI = MF.getSubtarget().getRegisterInfo();
+ThunksSet SLSHardeningInserter::insertThunks(MachineModuleInfo &MMI,
+                                             MachineFunction &MF,
+                                             ThunksSet ExistingThunks) {
+  const AArch64Subtarget *ST = &MF.getSubtarget<AArch64Subtarget>();
 
-  bool Modified = false;
   for (auto &MBB : MF) {
-    Modified |= hardenReturnsAndBRs(MBB);
-    Modified |= hardenBLRs(MBB);
+    if (ST->hardenSlsRetBr())
+      hardenReturnsAndBRs(MMI, MBB);
+    if (ST->hardenSlsBlr())
+      hardenBLRs(MMI, MBB, ExistingThunks);
   }
-
-  return Modified;
+  return ExistingThunks;
 }
 
 static bool isBLR(const MachineInstr &MI) {
@@ -120,9 +120,10 @@ static bool isBLR(const MachineInstr &MI) {
   return false;
 }
 
-bool AArch64SLSHardening::hardenReturnsAndBRs(MachineBasicBlock &MBB) const {
-  if (!ST->hardenSlsRetBr())
-    return false;
+bool SLSHardeningInserter::hardenReturnsAndBRs(MachineModuleInfo &MMI,
+                                               MachineBasicBlock &MBB) {
+  const AArch64Subtarget *ST =
+      &MBB.getParent()->getSubtarget<AArch64Subtarget>();
   bool Modified = false;
   MachineBasicBlock::iterator MBBI = MBB.getFirstTerminator(), E = MBB.end();
   MachineBasicBlock::iterator NextMBBI;
@@ -138,12 +139,11 @@ bool AArch64SLSHardening::hardenReturnsAndBRs(MachineBasicBlock &MBB) const {
   return Modified;
 }
 
-static const char SLSBLRNamePrefix[] = "__llvm_slsblr_thunk_";
-
+static const unsigned NumPermittedRegs = 29;
 static const struct ThunkNameAndReg {
   const char* Name;
   Register Reg;
-} SLSBLRThunks[] = {
+} SLSBLRThunks[NumPermittedRegs] = {
   { "__llvm_slsblr_thunk_x0",  AArch64::X0},
   { "__llvm_slsblr_thunk_x1",  AArch64::X1},
   { "__llvm_slsblr_thunk_x2",  AArch64::X2},
@@ -180,36 +180,14 @@ static const struct ThunkNameAndReg {
   { "__llvm_slsblr_thunk_x31",  AArch64::XZR},
 };
 
-namespace {
-struct SLSBLRThunkInserter : ThunkInserter<SLSBLRThunkInserter> {
-  const char *getThunkPrefix() { return SLSBLRNamePrefix; }
-  bool mayUseThunk(const MachineFunction &MF) {
-    ComdatThunks &= !MF.getSubtarget<AArch64Subtarget>().hardenSlsNoComdat();
-    return MF.getSubtarget<AArch64Subtarget>().hardenSlsBlr();
-  }
-  bool insertThunks(MachineModuleInfo &MMI, MachineFunction &MF,
-                    bool ExistingThunks);
-  void populateThunk(MachineFunction &MF);
-
-private:
-  bool ComdatThunks = true;
-};
-} // namespace
-
-bool SLSBLRThunkInserter::insertThunks(MachineModuleInfo &MMI,
-                                       MachineFunction &MF,
-                                       bool ExistingThunks) {
-  if (ExistingThunks)
-    return false;
-  // FIXME: It probably would be possible to filter which thunks to produce
-  // based on which registers are actually used in BLR instructions in this
-  // function. But would that be a worthwhile optimization?
-  for (auto T : SLSBLRThunks)
-    createThunkFunction(MMI, T.Name, ComdatThunks);
-  return true;
+unsigned getThunkIndex(Register Reg) {
+  for (unsigned I = 0; I < NumPermittedRegs; ++I)
+    if (SLSBLRThunks[I].Reg == Reg)
+      return I;
+  llvm_unreachable("Unexpected register");
 }
 
-void SLSBLRThunkInserter::populateThunk(MachineFunction &MF) {
+void SLSHardeningInserter::populateThunk(MachineFunction &MF) {
   assert(MF.getFunction().hasComdat() == ComdatThunks &&
          "ComdatThunks value changed since MF creation");
   // FIXME: How to better communicate Register number, rather than through
@@ -258,8 +236,9 @@ void SLSBLRThunkInserter::populateThunk(MachineFunction &MF) {
                            Entry->end(), DebugLoc(), true /*AlwaysUseISBDSB*/);
 }
 
-MachineBasicBlock &AArch64SLSHardening::ConvertBLRToBL(
-    MachineBasicBlock &MBB, MachineBasicBlock::instr_iterator MBBI) const {
+void SLSHardeningInserter::convertBLRToBL(
+    MachineModuleInfo &MMI, MachineBasicBlock &MBB,
+    MachineBasicBlock::instr_iterator MBBI, ThunksSet &Thunks) {
   // Transform a BLR to a BL as follows:
   // Before:
   //   |-----------------------------|
@@ -285,7 +264,6 @@ MachineBasicBlock &AArch64SLSHardening::ConvertBLRToBL(
   //   |  barrierInsts               |
   //   |-----------------------------|
   //
-  // The __llvm_slsblr_thunk_xN thunks are created by the SLSBLRThunkInserter.
   // This function merely needs to transform BLR xN into BL
   // __llvm_slsblr_thunk_xN.
   //
@@ -318,37 +296,16 @@ MachineBasicBlock &AArch64SLSHardening::ConvertBLRToBL(
   }
   DebugLoc DL = BLR.getDebugLoc();
 
-  // If we'd like to support also BLRAA and BLRAB instructions, we'd need
-  // a lot more different kind of thunks.
-  // For example, a
-  //
-  // BLRAA xN, xM
-  //
-  // instruction probably would need to be transformed to something like:
-  //
-  // BL __llvm_slsblraa_thunk_x<N>_x<M>
-  //
-  // __llvm_slsblraa_thunk_x<N>_x<M>:
-  //   BRAA x<N>, x<M>
-  //   barrierInsts
-  //
-  // Given that about 30 different values of N are possible and about 30
-  // different values of M are possible in the above, with the current way
-  // of producing indirect thunks, we'd be producing about 30 times 30, i.e.
-  // about 900 thunks (where most might not be actually called). This would
-  // multiply further by two to support both BLRAA and BLRAB variants of those
-  // instructions.
-  // If we'd want to support this, we'd probably need to look into a different
-  // way to produce thunk functions, based on which variants are actually
-  // needed, rather than producing all possible variants.
-  // So far, LLVM does never produce BLRA* instructions, so let's leave this
-  // for the future when LLVM can start producing BLRA* instructions.
   MachineFunction &MF = *MBBI->getMF();
   MCContext &Context = MBB.getParent()->getContext();
-  auto ThunkIt =
-      llvm::find_if(SLSBLRThunks, [Reg](auto T) { return T.Reg == Reg; });
-  assert (ThunkIt != std::end(SLSBLRThunks));
-  MCSymbol *Sym = Context.getOrCreateSymbol(ThunkIt->Name);
+  const TargetInstrInfo *TII = MF.getSubtarget().getInstrInfo();
+  unsigned ThunkIndex = getThunkIndex(Reg);
+  StringRef ThunkName = SLSBLRThunks[ThunkIndex].Name;
+  MCSymbol *Sym = Context.getOrCreateSymbol(ThunkName);
+  if (!(Thunks & (1u << ThunkIndex))) {
+    Thunks |= 1u << ThunkIndex;
+    createThunkFunction(MMI, ThunkName, ComdatThunks);
+  }
 
   MachineInstr *BL = BuildMI(MBB, MBBI, DL, TII->get(BLOpcode)).addSym(Sym);
 
@@ -386,13 +343,11 @@ MachineBasicBlock &AArch64SLSHardening::ConvertBLRToBL(
                                            RegIsKilled /*isKill*/));
   // Remove BLR instruction
   MBB.erase(MBBI);
-
-  return MBB;
 }
 
-bool AArch64SLSHardening::hardenBLRs(MachineBasicBlock &MBB) const {
-  if (!ST->hardenSlsBlr())
-    return false;
+bool SLSHardeningInserter::hardenBLRs(MachineModuleInfo &MMI,
+                                      MachineBasicBlock &MBB,
+                                      ThunksSet &Thunks) {
   bool Modified = false;
   MachineBasicBlock::instr_iterator MBBI = MBB.instr_begin(),
                                     E = MBB.instr_end();
@@ -401,31 +356,30 @@ bool AArch64SLSHardening::hardenBLRs(MachineBasicBlock &MBB) const {
     MachineInstr &MI = *MBBI;
     NextMBBI = std::next(MBBI);
     if (isBLR(MI)) {
-      ConvertBLRToBL(MBB, MBBI);
+      convertBLRToBL(MMI, MBB, MBBI, Thunks);
       Modified = true;
     }
   }
   return Modified;
 }
 
-FunctionPass *llvm::createAArch64SLSHardeningPass() {
-  return new AArch64SLSHardening();
-}
-
 namespace {
-class AArch64IndirectThunks : public ThunkInserterPass<SLSBLRThunkInserter> {
+class AArch64SLSHardening : public ThunkInserterPass<SLSHardeningInserter> {
 public:
   static char ID;
 
-  AArch64IndirectThunks() : ThunkInserterPass(ID) {}
+  AArch64SLSHardening() : ThunkInserterPass(ID) {}
 
-  StringRef getPassName() const override { return "AArch64 Indirect Thunks"; }
+  StringRef getPassName() const override { return AARCH64_SLS_HARDENING_NAME; }
 };
 
 } // end anonymous namespace
 
-char AArch64IndirectThunks::ID = 0;
+char AArch64SLSHardening::ID = 0;
+
+INITIALIZE_PASS(AArch64SLSHardening, "aarch64-sls-hardening",
+                AARCH64_SLS_HARDENING_NAME, false, false)
 
-FunctionPass *llvm::createAArch64IndirectThunks() {
-  return new AArch64IndirectThunks();
+FunctionPass *llvm::createAArch64SLSHardeningPass() {
+  return new AArch64SLSHardening();
 }
diff --git a/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp b/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
index 37ce07d4a09de2..bcd677310d1247 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
@@ -861,7 +861,6 @@ void AArch64PassConfig::addPreEmitPass() {
 }
 
 void AArch64PassConfig::addPostBBSections() {
-  addPass(createAArch64IndirectThunks());
   addPass(createAArch64SLSHardeningPass());
   addPass(createAArch64PointerAuthPass());
   if (EnableBranchTargets)
diff --git a/llvm/test/CodeGen/AArch64/O0-pipeline.ll b/llvm/test/CodeGen/AArch64/O0-pipeline.ll
index a0306b8e1e9244..78a7b84b8479b5 100644
--- a/llvm/test/CodeGen/AArch64/O0-pipeline.ll
+++ b/llvm/test/CodeGen/AArch64/O0-pipeline.ll
@@ -74,7 +74,6 @@
 ; CHECK-NEXT:       StackMap Liveness Analysis
 ; CHECK-NEXT:       Live DEBUG_VALUE analysis
 ; CHECK-NEXT:       Machine Sanitizer Binary Metadata
-; CHECK-NEXT:       AArch64 Indirect Thunks
 ; CHECK-NEXT:       AArch64 sls hardening pass
 ; CHECK-NEXT:       AArch64 Pointer Authentication
 ; CHECK-NEXT:       AArch64 Branch Targets
diff --git a/llvm/test/CodeGen/AArch64/O3-pipeline.ll b/llvm/test/CodeGen/AArch64/O3-pipeline.ll
index 84e672d14d99d5..c5d604a5a2783e 100644
--- a/llvm/test/CodeGen/AArch64/O3-pipeline.ll
+++ b/llvm/test/CodeGen/AArch64/O3-pipeline.ll
@@ -227,7 +227,6 @@
 ; CHECK-NEXT:       Machine Sanitizer Binary Metadata
 ; CHECK-NEXT:     Machine Outliner
 ; CHECK-NEXT:     FunctionPass Manager
-; CHECK-NEXT:       AArch64 Indirect Thunks
 ; CHECK-NEXT:       AArch64 sls hardening pass
 ; CHECK-NEXT:       AArch64 Pointer Authentication
 ; CHECK-NEXT:       AArch64 Branch Targets
diff --git a/llvm/test/CodeGen/AArch64/arm64-opt-remarks-lazy-bfi.ll b/llvm/test/CodeGen/AArch64/arm64-opt-remarks-lazy-bfi.ll
index 3ffaf962425b38..08c314e538734c 100644
--- a/llvm/test/CodeGen/AArch64/arm64-opt-remarks-lazy-bfi.ll
+++ b/llvm/test/CodeGen/AArch64/arm64-opt-remarks-lazy-bfi.ll
@@ -34,10 +34,6 @@
 ; HOTNESS-NEXT:  Executing Pass 'Function Pass Manager'
 ; HOTNESS-NEXT: Executing Pass 'Verify generated machine code' on Function 'empty_func'...
 ; HOTNESS-NEXT:  Freeing Pass 'Verify generated machine code' on Function 'empty_func'...
-; HOTNESS-NEXT: Executing Pass 'AArch64 Indirect Thunks' on Function 'empty_func'...
-; HOTNESS-NEXT:  Freeing Pass 'AArch64 Indirect Thunks' on Function 'empty_func'...
-; HOTNESS-NEXT: Executing Pass 'Verify generated machine code' on Function 'empty_func'...
-; HOTNESS-NEXT:  Freeing Pass 'Verify generated machine code' on Function 'empty_func'...
 ; HOTNESS-NEXT: Executing Pass 'AArch64 sls hardening pass' on Function 'empty_func'...
 ; HOTNESS-NEXT:  Freeing Pass 'AArch64 sls hardening pass' on Function 'empty_func'...
 ; HOTNESS-NEXT: Executing Pass 'Verify generated machine code' on Function 'empty_func'...
@@ -83,10 +79,6 @@
 ; NO_HOTNESS-NEXT:  Executing Pass 'Function Pass Manager'
 ; NO_HOTNESS-NEXT: Executing Pass 'Verify generated machine code' on Function 'empty_func'...
 ; NO_HOTNESS-NEXT:  Freeing Pass 'Verify generated machine code' on Function 'empty_func'...
-; NO_HOTNESS-NEXT: Executing Pass 'AArch64 Indirect Thunks' on Function 'empty_func'...
-; NO_HOTNESS-NEXT:  Freeing Pass 'AArch64 Indirect Thunks' on Function 'empty_func'...
-; NO_HOTNESS-NEXT: Executing Pass 'Verify generated machine code' on Function 'empty_func'...
-; NO_HOTNESS-NEXT:  Freeing Pass 'Verify generated machine code' on Function 'empty_func'...
 ; NO_HOTNESS-NEXT: Executing Pass 'AArch64 sls hardening pass' on Function 'empty_func'...
 ; NO_HOTNESS-NEXT:  Freeing Pass 'AArch64 sls hardening pass' on Function 'empty_func'...
 ; NO_HOTNESS-NEXT: Executing Pass 'Verify generated machine code' on Function 'empty_func'...
diff --git a/llvm/test/CodeGen/AArch64/speculation-hardening-sls-blr-bti.mir b/llvm/test/CodeGen/AArch64/speculation-hardening-sls-blr-bti.mir
index 92353b648943a6..46e42bbe6bf82b 100644
--- a/llvm/test/CodeGen/AArch64/speculation-hardening-sls-blr-bti.mir
+++ b/llvm/test/CodeGen/AArch64/speculation-hardening-sls-blr-bti.mir
@@ -8,8 +8,6 @@
 # These BLR/BTI bundles are produced when calling a returns_twice function
 # (like setjmp) indirectly.
 --- |
-  $__llvm_slsblr_thunk_x8 = comdat any
-
   define dso_local void @fn() #0 {
   entry:
     %fnptr = alloca ptr, align 8
@@ -22,15 +20,8 @@
   ; Function Attrs: returns_twice
   declare i32 @setjmp(ptr noundef) #1
 
-  ; Function Attrs: naked nounwind
-  define linkonce_odr hidden void @__llvm_slsblr_thunk_x8() #2 comdat {
-  entry:
-    ret void
-  }
-
   attributes #0 = { "target-features"="+harden-sls-blr" }
   attributes #1 = { returns_twice }
-  attributes #2 = { naked nounwind }
 
   !llvm.module.flags = !{!0}
   !0 = !{i32 8, !"branch-target-enforcement", i32 1}
@@ -75,14 +66,3 @@ body:             |
     early-clobber $sp, $lr = frame-destroy LDRXpost $sp, 16 :: (load (s64) from %stack.1)
     RET undef $lr
 ...
----
-name:            __llvm_slsblr_thunk_x8
-tracksRegLiveness: true
-body:             |
-  bb.0.entry:
-    liveins: $x8
-
-    $x16 = ORRXrs $xzr, $x8, 0
-    BR $x16
-    SpeculationBarrierISBDSBEndBB
-...
diff --git a/llvm/test/CodeGen/AArch64/speculation-hardening-sls-blr.mir b/llvm/test/CodeGen/AArch64/speculation-hardening-sls-blr.mir
index 81f95348f511e5..7f85ce8485a07f 100644
--- a/llvm/test/CodeGen/AArch64/speculation-hardening-sls-blr.mir
+++ b/llvm/test/CodeGen/AArch64/speculation-hardening-sls-blr.mir
@@ -1,12 +1,17 @@
 # RUN: llc -verify-machineinstrs -mtriple=aarch64-none-linux-gnu \
 # RUN:     -start-before aarch64-sls-hardening \
 # RUN:     -stop-after aarch64-sls-hardening -o - %s \
-# RUN:   | FileCheck %s --check-prefixes=CHECK
+# RUN:   | FileCheck %s --check-prefixes=CHECK \
+# RUN:                  --implicit-check-not=__llvm_slsblr_thunk_x7
+# RUN: llc -verify-machineinstrs -mtriple=aarch64-none-linux-gnu \
+# RUN:     -start-before aarch64-sls-hardening \
+# RUN:     -asm-verbose=0 -o - %s \
+# RUN:   | FileCheck %s --check-prefixes=ASM \
+# RUN:                  --implicit-check-not=__llvm_slsblr_thunk_x7
 
 # Check that the BLR SLS hardening transforms a BLR into a BL with operands as
 # expected.
 --- |
-  $__llvm_slsblr_thunk_x8 = comdat any
   @a = dso_local local_unnamed_addr global i32 (...)* null, align 8
   @b = dso_local local_unnamed_addr global i32 0, align 4
 
@@ -17,12 +22,6 @@
     store i32 %call, i32* @b, align 4
     ret void
   }
-
-  ; Function Attrs: naked nounwind
-  define linkonce_odr hidden void @__llvm_slsblr_thunk_x8() naked nounwind comdat {
-  entry:
-    ret void
-  }
 ...
 ---
 name:            fn1
@@ -46,13 +45,15 @@ body:             |
 
 
 ...
----
-name:            __llvm_slsblr_thunk_x8
-tracksRegLiveness: true
-body:             |
-  bb.0.entry:
-    liveins: $x8
 
-    BR $x8
-    SpeculationBarrierISBDSBEndBB
-...
+# CHECK: name: __llvm_slsblr_thunk_x8
+#
+# CHECK:       $x16 = ORRXrs $xzr, $x8, 0
+# CHECK-NEXT:  BR $x16
+# CHECK-NEXT:  SpeculationBarrierISBDSBEndBB
+
+# ASM-LABEL: __llvm_slsblr_thunk_x8:
+# ASM-NEXT:    mov x16, x8
+# ASM-NEXT:    br  x16
+# ASM-NEXT:    dsb sy
+# ASM-NEXT:    isb