[llvm] [CodeGen] Refactor and document ThunkInserter (PR #97468)

Kristof Beyls via llvm-commits llvm-commits at lists.llvm.org
Wed Jul 3 05:44:06 PDT 2024


================
@@ -7,34 +7,104 @@
 //===----------------------------------------------------------------------===//
 ///
 /// \file
-/// Contains a base class for Passes that inject an MI thunk.
+/// Contains a base ThunkInserter class that simplifies injection of MI thunks
+/// as well as a default implementation of MachineFunctionPass wrapping
+/// several `ThunkInserter`s for targets to extend.
 ///
 //===----------------------------------------------------------------------===//
 
 #ifndef LLVM_CODEGEN_INDIRECTTHUNKS_H
 #define LLVM_CODEGEN_INDIRECTTHUNKS_H
 
 #include "llvm/CodeGen/MachineFunction.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
 #include "llvm/CodeGen/MachineModuleInfo.h"
 #include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/Module.h"
 
 namespace llvm {
 
+/// This class assists in inserting MI thunk functions into the module and
+/// rewriting the existing machine functions to call these thunks.
+///
+/// One of the common cases is implementing security mitigations that involve
+/// replacing some machine code patterns with calls to special thunk functions.
+///
+/// Inserting a module pass late in the codegen pipeline may increase memory
+/// usage, as it serializes the transformations and forces preceding passes to
+/// produce machine code for all functions before running the module pass.
+/// For that reason, ThunkInserter can be driven by a MachineFunctionPass by
+/// passing one MachineFunction at a time to its `run(MMI, MF)` method.
+/// Then, the derived class should
+/// * call createThunkFunction from its insertThunks method exactly once for
+///   each of the thunk functions to be inserted
+/// * populate the thunk in its populateThunk method
+///
+/// Note that if some other pass is responsible for rewriting the functions,
+/// insertThunks method can simply create all possible thunks at once, probably
+/// postponed until the first occurrence of possibly affected MF.
+///
+/// Alternatively, insertThunks method can rewrite MF by itself and only insert
+/// the thunks being called. In that case InsertedThunks variable can be used
+/// to track which thunks were already inserted.
+///
+/// In any case, the thunk function has to be inserted on behalf of some other
+/// function and then populated on its own "iteration" later - this is because
+/// MachineFunctionPass will see the newly created functions, but they first
+/// have to go through the preceding passes from the same pass manager,
+/// possibly even through the instruction selector.
+//
+// FIXME Maybe implement a documented and less surprising way of modifying
+//       the module from a MachineFunctionPass that is restricted to inserting
+//       completely new functions to the module.
 template <typename Derived, typename InsertedThunksTy = bool>
 class ThunkInserter {
   Derived &getDerived() { return *static_cast<Derived *>(this); }
 
-protected:
   // A variable used to track whether (and possible which) thunks have been
   // inserted so far. InsertedThunksTy is usually a bool, but can be other types
   // to represent more than one type of thunk. Requires an |= operator to
   // accumulate results.
   InsertedThunksTy InsertedThunks;
-  void doInitialization(Module &M) {}
+
+protected:
+  // Interface for subclasses to use.
+
+  /// Create an empty thunk function.
+  ///
+  /// The new function will eventually be passed to populateThunk. If multiple
+  /// thunks are created, populateThunk can distinguish them by their names.
   void createThunkFunction(MachineModuleInfo &MMI, StringRef Name,
                            bool Comdat = true, StringRef TargetAttrs = "");
 
+protected:
+  // Interface for subclasses to implement.
+  //
+  // Note: all functions are non-virtual and are called via getDerived().
+  // Note: only doInitialization() has an implementation.
+
+  /// Initializes thunk inserter.
+  void doInitialization(Module &M) {}
+
+  /// Returns common prefix for thunk function's names.
+  const char *getThunkPrefix(); // undefined
+
+  /// Checks if MF may use thunks (true - maybe, false - definitely not).
+  bool mayUseThunk(const MachineFunction &MF); // undefined
+
+  /// Rewrites the function if necessary, returns the set of thunks added.
+  InsertedThunksTy insertThunks(MachineModuleInfo &MMI, MachineFunction &MF,
+                                InsertedThunksTy ExistingThunks); // undefined
+
+  /// Populate the thunk function with instructions.
+  ///
+  /// If multiple thunks are created, inspect the thunk's name.
----------------
kbeyls wrote:

I find this comment a bit cryptic. I guess the intent is for it to point out the following:

"The content that must be inserted in the thunk function body should be derived from the `MF`s name."?

https://github.com/llvm/llvm-project/pull/97468


More information about the llvm-commits mailing list