[Lldb-commits] [PATCH] D128576: [trace] Make events first class items in the trace cursor and rework errors

Tue Jun 28 20:02:04 PDT 2022

wallace added inline comments.

================
Comment at: lldb/include/lldb/Target/Trace.h:171
+  ///     information failed to load, an \a llvm::Error is returned.
+  virtual llvm::Expected<lldb::TraceCursorUP>
+  CreateNewCursor(Thread &thread) = 0;
----------------
jj10306 wrote:
> Do we want to keep the cursor as a UP or take this refactor before creating the public bindings to make it a SP? IMO a UP might make sense because I can't really think of many cases where you would want multiple users of the same cursor.
> wdyt?
It seems that we can use unique_ptrs! SBMemoryRegionInfoList is an example of a SB wrapper of a unique_ptr. It seems that python reuses the same reference of the SB object with all the variables that use it. On the other hand, if you use c++, the unique_ptr would be enforced strictly

================
Comment at: lldb/include/lldb/Target/TraceCursor.h:44-46
+///     The cursor can also point at events in the trace, which aren't errors
+///     nor instructions. An example of an event could be a context switch in
+///     between two instructions.
----------------
jj10306 wrote:
> do you want to include pointers to the methods you can use to check/get events similar to what you did for errors above?
good idea!

================
Comment at: lldb/include/lldb/Target/TraceCursor.h:232-246
+  virtual const char *GetError() const = 0;

-  /// Get the corresponding error message if the cursor points to an error in
-  /// the trace.
-  ///
   /// \return
-  ///     \b nullptr if the cursor is not pointing to an error in
-  ///     the trace. Otherwise return the actual error message.
-  virtual const char *GetError() = 0;
+  ///     Whether the cursor points to an event or not.
+  virtual bool IsEvent() const = 0;

   /// \return
----------------
jj10306 wrote:
> For the getters, what are your thoughts on returning an optional instead of using designated value/variants (ie nullptr, LLDB_INVALID_INSTRUCTION, eTraceEventNone`) to indicate that the current item does not match what `Get` method is being used?
> 
> Along the same lines, but a slightly bigger change:
> With the introduction of `Events` as first class citizens of the trace cursor API, it may make sense to introduce the notion of of a trace "item" or something similar that encapsulates (instructions, events, errors, etc). I'm thinking of a tagged union type structure (ie a Rust enum)  that enforces the correctness of the access to the underlying item.
> wdyt?
check the new version of the code. I'm now using a tagged union for the low level storage, and then using an enum for the item kinds in the TraceCursor. I think this version introduces more safety and it's easier to handle the low level storage, not to mention that we now use less memory!

================
Comment at: lldb/include/lldb/Target/TraceCursor.h:258
+  virtual llvm::Optional<uint64_t>
+  GetCounter(lldb::TraceCounter counter_type) const = 0;

----------------
jj10306 wrote:
> Now that we are introducing the notion of an `Event`? wdyt about combining Events and Counters since a counter value in a trace really is just a type of event? In this case, `Counter` could just become a variant of `TraceEvent`. This may make more sense because even in the case of TSCs in an IntelPT trace because, strictly speaking, these aren't associated with instructions, correct? Instead the TSC values are emitted with PSBs and then we "attribute" these values to the nearest instruction, right?
I really like your idea, but that's not that trivial, so let's try to come up with a nice design. What about the following?

- We create a new event `TraceEventHWClockTick`. This will help use identify quickly anytime we see a different TSC. That way we don't need to keep asking every instruction for its TSC even when it doesn't change. The new name also makes it more generic.
- In order to access the TSC value, the user could invoke `uint64_t GetEventValueHWClockTick()`. I'd imagine that each event type would have its own data accessor. 
- We create another method `optional<uint64_t> GetEventValueLastHWClockTick()`, which returns the most recent value for this event type that happened on or before the cursor's current point. I wouldn't make this kind of functionality generic for all event kinds because for some it might not make sense at all. Anyway, the reason is the following code

In the dumper, we would like to support this flow:
   cursor->GoTo(starting_id)
   hw_timestamp = cursor->GetEventValueLastHWClockTick();

   for (; cursor->HasValue(); cursor->Next()) {
     if (cursor->GetItemKind() == eTraceItemKindEvent && cursor->GetEventKind() == eTraceEventHWClockTick)
       hw_timestamp = cursor->GetEventValueHWClockTick();

     cout << "current clock (tsc) " << hw_timestamp << endl;
  }

so, basically we need a way to know the starting tsc for a given instruction id, and then to identify changes.

Later, we can implement a new method `nanoseconds GetEstimatedWallClockTime()`, which delegates to the plug-in the job to estimate the wall clock time for a given instruction item. This estimation makes more sense as an attribute of each item.

What do you think?

================
Comment at: lldb/include/lldb/Target/TraceDumper.h:59
+  /// Helper struct that holds all the information we know about a trace item
+  struct TraceItem {
     lldb::user_id_t id;
----------------
jj10306 wrote:
> oh looks like you already introduced a `TraceItem` structure!
> Similar to my comment above related to introducing a trace item type structure for the trace cursor, would it make sense to have a separate type for each of the different "items". With the way the TraceItem struct is currently, a lot of this data is mutually exclusive, hence the many optional fields, correct?
I think it's fine to have a very dumb TraceItem in the Dumper, because of its simplicity. However, when dealing with the API, it's better to avoid returning data structures. Remember that because of python support, we need to create a unique_ptr or shared_ptr for every data structure that we return, which can potentially add a really big overhead if we need to create a data structure for every trace item. On the other hand, if we do most of the queries using functions, we can avoid that overhead and allow fast trace iteration even from python. So being specific, I want to avoid creating a SBTraceItem, or SBTraceInstruction, etc, except for very exceptional cases, like a possible SBTraceContextSwitchEvent, which is something so rare in the trace that would add no real traversal overhead

================
Comment at: lldb/source/Plugins/Trace/intel-pt/DecodedThread.h:145-147
   /// \return
-  ///     The control flow categories, or \b 0 if the instruction is an error.
+  ///     The control flow categories, or an undefined vlaue if the item is not
+  ///     an instruction.
----------------
jj10306 wrote:
> Why not use an optional here instead?
we are going to get rid of this after @persona0220 's work

================
Comment at: lldb/source/Plugins/Trace/intel-pt/DecodedThread.h:236-239
+  enum class ItemKind : uint64_t {
+    Instruction = 0,
+    Error = LLDB_INVALID_ADDRESS - 1,
+    Event = LLDB_INVALID_ADDRESS - 2,
----------------
jj10306 wrote:
> Continuing the theme of my comments related to the notion of a `TraceItem` - would it be possible to unite the notion of a trace item? currently there is an Item enum here, a item struct in the Dumper and I feel there is a need for a Item at the trace cursor level as well.
> wdyt?
i'm reworking this with a new tagged union

================
Comment at: lldb/source/Plugins/Trace/intel-pt/DecodedThread.h:241
+  };
+  ItemKind LowestNonInstructionItemKind = ItemKind::Event;
+
----------------
jj10306 wrote:
> why is this needed?
see above

================
Comment at: lldb/source/Plugins/Trace/intel-pt/DecodedThread.h:253-258
+  /// The low level storage of all instruction addresses, error and events. Each
+  /// trace item has an index in this vector and this index is used throughout
+  /// the code. As an storage optimization, values of \b ItemKind::Error
+  /// indicate that this item is an error, \b ItemKind::Event if the item is an
+  /// event, and otherwise the value is an instruction address.
+  std::vector<uint64_t> m_trace_items;
----------------
jj10306 wrote:
> I found this a bit confusing. Are the values of the vectors the indexes into the DenseMaps below (m_errors and m_events) or are these the actual error/event values?
> How does this vector relate to the two vectors below (instruction_size/classes since now this is items and not just instructions?
> 
see above

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128576/new/

https://reviews.llvm.org/D128576