[Lldb-commits] [PATCH] D123375: [lldb][intelpt] Reduce trace memory usage by grouping instructions

Mon Apr 11 17:11:59 PDT 2022

wallace requested changes to this revision.
wallace added a comment.
This revision now requires changes to proceed.

I did a first pass on this diff. I'm asking to refactor a bit the InstructionBlock classes so that they are smarter. Besides that, if you use IDs more ubiquitously and stop using instruction indices everywhere, everything becomes much simpler.

================
Comment at: lldb/source/Plugins/Trace/intel-pt/DecodedThread.h:112-135
+  /// \struct InstructionPointer
+  /// Structure that stores a 6 byte prefix a list of 2 byte suffixes for each
+  /// instruction that is appended to the trace, in order. Concatenating the
+  /// prefix and suffix for any instruction ID (which is the concatenation of
+  /// two indices) will give the instruction load address.
+  ///
+  /// Generally most consecutive instructions will be close. Unless you are
----------------
Let's rename it to InstructionsBlock or another similar name. An instruction pointer is actually another way to refer to an instruction address.
Let's also try to remove numbers from the comments, so that if we modify them, we don't need to update the comment. It's hard to update comments because the compiler will never complaint if the comment doesn't make sense anymore.

Let's also make the size of the suffix dynamic, so that we can easily experiment with it later. We can use templates asking for a suffix type that must be unsigned.

If we embed the type of the suffix in the template, then we can do the splitting inside this class instead of outside. If you use my proposal for the constructor, please move its implementation to the cpp file.

Also, as this struct is not just a simple bag of data, let's make it a proper class.

I renamed a few things to create a nicer API.

First of all, I'm renaming the Append method to TryAppend, which receives the full load address and returns true if the append could happen, and false otherwise.

Besides that, I'm asking for a new class called Instructions that contains all the instruction blocks and receives instruction ids, and then it's able to defer the job to the actual instruction block.

================
Comment at: lldb/source/Plugins/Trace/intel-pt/DecodedThread.h:173-201
   /// \return
   ///     The load address of the instruction at the given index, or \a
   ///     LLDB_INVALID_ADDRESS if it is an error.
-  lldb::addr_t GetInstructionLoadAddress(size_t insn_index) const;
+  lldb::addr_t GetInstructionLoadAddress(size_t insn_id) const;

   /// Get the \a lldb::TraceInstructionControlFlowType categories of the
   /// instruction.
----------------
all of these become simpler if you work directly with the id instead of the instruction index

================
Comment at: lldb/source/Plugins/Trace/intel-pt/DecodedThread.h:260-263
   /// The size in bytes of each instruction.
   std::vector<uint8_t> m_instruction_sizes;
   /// The libipt instruction class for each instruction.
   std::vector<pt_insn_class> m_instruction_classes;
----------------
you can move these two to the new InstructionsBlock class

================
Comment at: lldb/source/Plugins/Trace/intel-pt/DecodedThread.h:271
   /// first one. Otherwise, this map will be empty.
   std::map<size_t, uint64_t> m_instruction_timestamps;
   /// This is the chronologically last TSC that has been added.
----------------
if you convert this from `instruction index -> TSC` to `instruction id -> TSC`, then everything becomes simpler

================
Comment at: lldb/source/Plugins/Trace/intel-pt/DecodedThread.h:274-276
+  /// For each \a InstructionPointer block, this contains the net number of
+  /// instructions upto that block
+  std::vector<size_t> m_ipblock_sizes;
----------------
i have the impression that you don't need this anymore

================
Comment at: lldb/source/Plugins/Trace/intel-pt/DecodedThread.h:279
+  /// trace. It maps `instruction index -> error message`.
   llvm::DenseMap<uint64_t, std::string> m_errors;
   /// The size in bytes of the raw buffer before decoding. It might be None if
----------------
you can also make it `instruction id -> error message`

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123375/new/

https://reviews.llvm.org/D123375