[Lldb-commits] [PATCH] D126267: [trace][intelpt] Support system-wide tracing [13] - Add context switch decoding

Wed Jun 15 09:26:19 PDT 2022

wallace added inline comments.
Herald added a subscriber: Michael137.

================
Comment at: lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCoreDecoder.cpp:22-42
+struct PerfContextSwitchRecord {
+  struct perf_event_header header;
+  uint32_t next_prev_pid;
+  uint32_t next_prev_tid;
+  uint32_t pid, tid;
+  uint64_t time_in_nanos;
+
----------------
jj10306 wrote:
> should these structures live in `Perf.h`?
I created a new file PerfContextSwitchDecoder. and moved this code there. We can't actually use Perf.h because that's private to lldb-server

================
Comment at: lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCoreDecoder.cpp:59-102
+ThreadContinuousExecution ThreadContinuousExecution::CreateCompleteExecution(
+    lldb::core_id_t core_id, lldb::tid_t tid, uint64_t start, uint64_t end) {
+  ThreadContinuousExecution o(core_id, tid);
+  o.variant = Variant::Complete;
+  o.tscs.complete.start = start;
+  o.tscs.complete.end = end;
+  return o;
----------------
jj10306 wrote:
> nit: if you switch the union for two optional values you can remove a lot of the redundancy between these methods.
in this case I'm willing to pay the cost of being super explicit in initialization to be able to be super safe when using the context switches. I don't want that for any reason we use context switch that we are not completely sure that is correct, so I'm adding very explicit variable names to prevent myself and others from making mistakes

================
Comment at: lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCoreDecoder.cpp:136-141
+  if (record.IsIn() && prev.IsIn()) {
+    // We found two consecutive ins, which means that we didn't capture
+    // the end of the previous execution.
+    on_new_thread_execution(ThreadContinuousExecution::CreateHintedEndExecution(
+        core_id, prev.tid, prev.tsc, record.tsc - 1));
+  } else if (record.IsOut() && prev.IsOut()) {
----------------
jj10306 wrote:
> can you help me understand how these two cases could happen? they seem fundamentally impossible given the nature of context switching - shouldn't all "continuous" execution be disjoint?
> 
> My current understanding is as follows:
> 
> Expected:
> i1 o1 i4 o4 i9 o9
> 
> Impossible:
> and this is not possible:
> i1 i4 o4 o1  9 o9
> 
> Let me know if I'm missing something 🙂
> 
this can happen if, for example, there are contention issues. You can have something like this

  tid 12 in
  tid 12 out
  <now you start reading the context switch trace in the collector and update the tail pointer>
      during this time the following context switches happened but you lost them
      tid 13 in
      tid 13 out
      tid 14 in
  <here the kernel resumes writing context switches because the tail pointer was updated>
  tid 14 out

and now you have <tid 12 out> followed by <tid 14 out>. That means that you can rely on the first execution (tid 12), but you can't trust the second one (tid 14) because you don't really know when it started.

================
Comment at: lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCoreDecoder.cpp:146
+        ThreadContinuousExecution::CreateHintedStartExecution(
+            core_id, record.tid, prev.tsc + 1, record.tsc));
+  } else if (record.IsOut() && prev.IsIn()) {
----------------
jj10306 wrote:
> why the + 1?
let's follow the example above, you have <tid 12 out: tsc A> followed by <tid 14 out: tsc B>

you don't really know when the execution of tid 14 started, but you know it was after tsc A, i.e. tsc A + 1. So that's a hinted start

================
Comment at: lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCoreDecoder.cpp:181
+///   the trace.
+static Error DecodePerfContextSwitchTrace(
+    ArrayRef<uint8_t> data, core_id_t core_id,
----------------
jj10306 wrote:
> Do you think any of the general perf logic related to "decoding" the records should be moved to `Perf.h/cpp`?
see my first response above

================
Comment at: lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCoreDecoder.cpp:196
+    const PerfContextSwitchRecord &perf_record =
+        *reinterpret_cast<const PerfContextSwitchRecord *>(data.data() +
+                                                           offset);
----------------
jj10306 wrote:
> nit: it feels weird casting to `PerfContextSwitchRecord` when we don't yet know if this is actually a context switch event without first looking at the header.
> casting to `perf_event_header` and checking that first before interpreting the record as a context switch record seems like a better approach.
> Given that currently only intelpt is using the LLDB's perf "library" this isn't a big deal, but if we wanted to make it more complete/robust, we should revisit this and improve our record handling design so it could be easily extended to support any record types.
makes sense. this is a code smell

================
Comment at: lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCoreDecoder.cpp:198
+                                                           offset);
+    // A record of 1000 uint64_t's or more should mean that the data is wrong
+    if (perf_record.header.size == 0 ||
----------------
jj10306 wrote:
> can you link the documentation that states this?
sure

================
Comment at: lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCoreDecoder.cpp:199-200
+    // A record of 1000 uint64_t's or more should mean that the data is wrong
+    if (perf_record.header.size == 0 ||
+        perf_record.header.size > sizeof(uint64_t) * 1000)
+      return CreateError(offset, formatv("A record of {0} bytes was found.",
----------------
jj10306 wrote:
> using sizeof on uint64_t feels weird since the typename already implies the name. I think moving this value to a constant and explaining its value would make things cleaner.
+1

================
Comment at: lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCoreDecoder.cpp:204-205
+
+    // We add + 100 to this record because some systems might have custom
+    // records. In any case, we are looking only for abnormal data.
+    if (perf_record.header.type >= PERF_RECORD_MAX + 100)
----------------
jj10306 wrote:
> same as above, can we link the docs. alternatively, link the docs at the top of the function or in the header and then reference that link at the appropriate spots in the code
+1

================
Comment at: lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCoreDecoder.cpp:263-272
+  auto do_decode = [&]() -> Error {
+    // We'll decode all context switch traces, identify continuous executions
+    // and group them by thread.
+    for (core_id_t core_id : m_cores) {
+      Error err = m_trace.OnCoreBinaryDataRead(
+          core_id, IntelPTDataKinds::kPerfContextSwitchTrace,
+          [&](ArrayRef<uint8_t> data) -> Error {
----------------
jj10306 wrote:
> this is lambda inception 😆
lol, i'll see how i can simplify it

================
Comment at: lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCoreDecoder.h:31-33
+    HintedStart,
+    /// The start is known and we have a guess for the end
+    HintedEnd,
----------------
jj10306 wrote:
> Where do these "guesses" come from?
see above

================
Comment at: lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCoreDecoder.h:40-61
+  union {
+    struct {
+      uint64_t start;
+      uint64_t end;
+    } complete;
+    struct {
+      uint64_t start;
----------------
jj10306 wrote:
> what about just having two optionals fields to remove the "dangers" that unions introduce? Then the Variant enum can be used to guide whether the start and end values should be non-null just as it's being used to access the union currently
see above

================
Comment at: lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCoreDecoder.h:105-107
+/// each Intel PT decoded instruction and in which order. It also assumes that
+/// the Intel PT data and context switches might have gaps in their traces due
+/// to contention or race conditions.
----------------
jj10306 wrote:
> what is meant by "contention" here? Is this referring to the ipt aux buffer wrapping?
it means when the trace reader thread couldn't keep up with the data and some context switch records were lost

================
Comment at: lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCoreDecoder.h:111
+  /// \param[in] core_ids
+  ///   The list of cores where the traced programs were running on.
+  ///
----------------
jj10306 wrote:
> is this just the cores that the program ran on or all possible cores the process could have run on?
all cores on the hardware. We don't know which ones our program ran on

================
Comment at: lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCoreDecoder.h:155
+  /// of the decoder.
+  llvm::Optional<std::string> m_setup_error;
+};
----------------
jj10306 wrote:
> why is this needed?
i've improved the documentation, but basically this variable holds any fatal decoding error we see to prevent multiple failed decoding attempts. If this is not-null, then this means that we tried to decode but it failed badly, and we don't want to try to redecode again because that's expensive.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D126267/new/

https://reviews.llvm.org/D126267