[llvm] 614a8cb - [MC][NFC] Allow MCInstrAnalysis to store state (#65479)

via llvm-commits llvm-commits at lists.llvm.org
Thu Oct 19 23:09:05 PDT 2023


Author: Job Noorman
Date: 2023-10-20T06:09:01Z
New Revision: 614a8cbfd155b706688cfca0101f8d99988a9871

URL: https://github.com/llvm/llvm-project/commit/614a8cbfd155b706688cfca0101f8d99988a9871
DIFF: https://github.com/llvm/llvm-project/commit/614a8cbfd155b706688cfca0101f8d99988a9871.diff

LOG: [MC][NFC] Allow MCInstrAnalysis to store state (#65479)

Currently, all the analysis functions provided by `MCInstrAnalysis` work
on a single instruction. On some targets, this limits the kind of
instructions that can be successfully analyzed as common constructs may
need multiple instructions.

For example, a typical call sequence on RISC-V uses a auipc+jalr pair.
In order to analyse the jalr inside `evaluateBranch`, information about
the corresponding auipc is needed. Similarly, AArch64 uses adrp+ldr
pairs to access globals.

This patch proposes to add state to `MCInstrAnalysis` to support these
use cases. Two new virtual methods are added:
- `updateState`: takes an instruction and its address. This methods
should be called by clients on every instruction and allows targets to
store whatever information they need to analyse future instructions.
- `resetState`: clears the state whenever it becomes irrelevant. Clients
could call this, for example, when starting to disassemble a new
function.

Note that the default implementations do nothing so this patch is NFC.
No actual state is stored inside `MCInstrAnalysis`; deciding the
structure of the state is left to the targets.

This patch also modifies llvm-objdump to use the new interface.

This patch is an alternative to
[D116677](https://reviews.llvm.org/D116677) and the idea of storing
state in `MCInstrAnalysis` was first discussed there.

Added: 
    

Modified: 
    llvm/include/llvm/MC/MCInstrAnalysis.h
    llvm/tools/llvm-objdump/llvm-objdump.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/include/llvm/MC/MCInstrAnalysis.h b/llvm/include/llvm/MC/MCInstrAnalysis.h
index c3c675c39c5590c..e3ddf0b8b8939c9 100644
--- a/llvm/include/llvm/MC/MCInstrAnalysis.h
+++ b/llvm/include/llvm/MC/MCInstrAnalysis.h
@@ -37,6 +37,21 @@ class MCInstrAnalysis {
   MCInstrAnalysis(const MCInstrInfo *Info) : Info(Info) {}
   virtual ~MCInstrAnalysis() = default;
 
+  /// Clear the internal state. See updateState for more information.
+  virtual void resetState() {}
+
+  /// Update internal state with \p Inst at \p Addr.
+  ///
+  /// For some types of analyses, inspecting a single instruction is not
+  /// sufficient. Some examples are auipc/jalr pairs on RISC-V or adrp/ldr pairs
+  /// on AArch64. To support inspecting multiple instructions, targets may keep
+  /// track of an internal state while analysing instructions. Clients should
+  /// call updateState for every instruction which allows later calls to one of
+  /// the analysis functions to take previous instructions into account.
+  /// Whenever state becomes irrelevant (e.g., when starting to disassemble a
+  /// new function), clients should call resetState to clear it.
+  virtual void updateState(const MCInst &Inst, uint64_t Addr) {}
+
   virtual bool isBranch(const MCInst &Inst) const {
     return Info->get(Inst.getOpcode()).isBranch();
   }

diff  --git a/llvm/tools/llvm-objdump/llvm-objdump.cpp b/llvm/tools/llvm-objdump/llvm-objdump.cpp
index 537c18bf3440d33..f02bd6a9b531a80 100644
--- a/llvm/tools/llvm-objdump/llvm-objdump.cpp
+++ b/llvm/tools/llvm-objdump/llvm-objdump.cpp
@@ -860,7 +860,7 @@ class DisassemblerTarget {
   std::unique_ptr<const MCSubtargetInfo> SubtargetInfo;
   std::shared_ptr<MCContext> Context;
   std::unique_ptr<MCDisassembler> DisAsm;
-  std::shared_ptr<const MCInstrAnalysis> InstrAnalysis;
+  std::shared_ptr<MCInstrAnalysis> InstrAnalysis;
   std::shared_ptr<MCInstPrinter> InstPrinter;
   PrettyPrinter *Printer;
 
@@ -1283,14 +1283,19 @@ collectBBAddrMapLabels(const std::unordered_map<uint64_t, BBAddrMap> &AddrToBBAd
   }
 }
 
-static void collectLocalBranchTargets(
-    ArrayRef<uint8_t> Bytes, const MCInstrAnalysis *MIA, MCDisassembler *DisAsm,
-    MCInstPrinter *IP, const MCSubtargetInfo *STI, uint64_t SectionAddr,
-    uint64_t Start, uint64_t End, std::unordered_map<uint64_t, std::string> &Labels) {
+static void
+collectLocalBranchTargets(ArrayRef<uint8_t> Bytes, MCInstrAnalysis *MIA,
+                          MCDisassembler *DisAsm, MCInstPrinter *IP,
+                          const MCSubtargetInfo *STI, uint64_t SectionAddr,
+                          uint64_t Start, uint64_t End,
+                          std::unordered_map<uint64_t, std::string> &Labels) {
   // So far only supports PowerPC and X86.
   if (!STI->getTargetTriple().isPPC() && !STI->getTargetTriple().isX86())
     return;
 
+  if (MIA)
+    MIA->resetState();
+
   Labels.clear();
   unsigned LabelCount = 0;
   Start += SectionAddr;
@@ -1316,6 +1321,7 @@ static void collectLocalBranchTargets(
           !Labels.count(Target) &&
           !(STI->getTargetTriple().isPPC() && Target == Index))
         Labels[Target] = ("L" + Twine(LabelCount++)).str();
+      MIA->updateState(Inst, Index);
     }
     Index += Size;
   }
@@ -1967,6 +1973,9 @@ disassembleObject(ObjectFile &Obj, const ObjectFile &DbgObj,
                                BBAddrMapLabels);
       }
 
+      if (DT->InstrAnalysis)
+        DT->InstrAnalysis->resetState();
+
       while (Index < End) {
         // ARM and AArch64 ELF binaries can interleave data and text in the
         // same section. We rely on the markers introduced to understand what
@@ -2183,6 +2192,8 @@ disassembleObject(ObjectFile &Obj, const ObjectFile &DbgObj,
               if (TargetOS == &CommentStream)
                 *TargetOS << "\n";
             }
+
+            DT->InstrAnalysis->updateState(Inst, SectionAddr + Index);
           }
         }
 


        


More information about the llvm-commits mailing list