[Lldb-commits] [lldb] e17cae0 - [trace][intel pt] Fix per-psb packet decoding

Walter Erquinigo via lldb-commits lldb-commits at lists.llvm.org
Fri Aug 12 15:13:55 PDT 2022


Author: Walter Erquinigo
Date: 2022-08-12T15:13:48-07:00
New Revision: e17cae076c4727b99017927c3e8746db5bec6db7

URL: https://github.com/llvm/llvm-project/commit/e17cae076c4727b99017927c3e8746db5bec6db7
DIFF: https://github.com/llvm/llvm-project/commit/e17cae076c4727b99017927c3e8746db5bec6db7.diff

LOG: [trace][intel pt] Fix per-psb packet decoding

The per-PSB packet decoding logic was wrong because it was assuming that pt_insn_get_sync_offset was being udpated after every PSB. Silly me, that is not true. It returns the offset of the PSB packet after invoking pt_insn_sync_forward regardless of how many PSBs are visited later. Instead, I'm now following the approach described in https://github.com/intel/libipt/blob/master/doc/howto_libipt.md#parallel-decode for parallel decoding, which is basically what we need.

A nasty error that happened because of this is that when we had two PSBs (A and B), the following was happening

1. PSB A was processed all the way up to the end of the trace, which includes PSB B.
2. PSB B was then processed until the end of the trace.

The instructions emitted by step 2. were also emitted as part of step 1. so our trace had duplicated chunks. This problem becomes worse when you many PSBs.

As part of making sure this diff is correct, I added some other features that are very useful.

- Added a "synchronization point" event to the TraceCursor, so we can inspect when PSBs are emitted.
- Removed the single-thread decoder. Now the per-cpu decoder and single-thread decoder use the same code paths.
- Use the query decoder to fetch PSBs and timestamps. It turns out that the pt_insn_sync_forward of the instruction decoder can move past several PSBs (this means that we could skip some TSCs). On the other hand, the pt_query_sync_forward method doesn't skip PSBs, so we can get more accurate sync events and timing information.
- Turned LibiptDecoder into PSBBlockDecoder, which decodes single PSB blocks. It is the fundamental processing unit for decoding.
- Added many comments, asserts and improved error handling for clarity.
- Improved DecodeSystemWideTraceForThread so that a TSC is emitted always before a cpu change event. This was a bug that was annoying me before.
- SplitTraceInContinuousExecutions and FindLowestTSCInTrace are now using the query decoder, which can identify precisely each PSB along with their TSCs.
- Added an "only-events" option to the trace dumper to inspect only events.

I did extensive testing and I think we should have an in-house testing CI. The LLVM buildbots are not capable of supporting testing post-mortem traces of hundreds of megabytes. I'll leave that for later, but at least for now the current tests were able to catch most of the issues I encountered when doing this task.

A sample output of a program that I was single stepping is the following. You can see that only one PSB is emitted even though stepping happened!

```
thread #1: tid = 3578223
    0: (event) trace synchronization point [offset = 0x0xef0]
  a.out`main + 20 at main.cpp:29:20
    1: 0x0000000000402479    leaq   -0x1210(%rbp), %rax
    2: (event) software disabled tracing
    3: 0x0000000000402480    movq   %rax, %rdi
    4: (event) software disabled tracing
    5: (event) software disabled tracing
    6: 0x0000000000402483    callq  0x403bd4                  ; std::vector<int, std::allocator<int>>::vector at stl_vector.h:391:7
    7: (event) software disabled tracing
  a.out`std::vector<int, std::allocator<int>>::vector() at stl_vector.h:391:7
    8: 0x0000000000403bd4    pushq  %rbp
    9: (event) software disabled tracing
    10: 0x0000000000403bd5    movq   %rsp, %rbp
    11: (event) software disabled tracing
```

This is another trace of a long program with a few PSBs.
```
(lldb) thread trace dump instructions -E -f                                                                                                         thread #1: tid = 3603082
    0: (event) trace synchronization point [offset = 0x0x80]
    47417: (event) software disabled tracing
    129231: (event) trace synchronization point [offset = 0x0x800]
    146747: (event) software disabled tracing
    246076: (event) software disabled tracing
    259068: (event) trace synchronization point [offset = 0x0xf78]
    259276: (event) software disabled tracing
    259278: (event) software disabled tracing
    no more data
```

Differential Revision: https://reviews.llvm.org/D131630

Added: 
    

Modified: 
    lldb/include/lldb/Target/TraceCursor.h
    lldb/include/lldb/Target/TraceDumper.h
    lldb/include/lldb/lldb-enumerations.h
    lldb/source/Commands/CommandObjectThread.cpp
    lldb/source/Commands/Options.td
    lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
    lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
    lldb/source/Plugins/Trace/intel-pt/LibiptDecoder.cpp
    lldb/source/Plugins/Trace/intel-pt/LibiptDecoder.h
    lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp
    lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.h
    lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp
    lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCpuDecoder.cpp
    lldb/source/Target/TraceCursor.cpp
    lldb/source/Target/TraceDumper.cpp
    lldb/test/API/commands/trace/TestTraceDumpInfo.py
    lldb/test/API/commands/trace/TestTraceDumpInstructions.py
    lldb/test/API/commands/trace/TestTraceEvents.py
    lldb/test/API/commands/trace/TestTraceLoad.py
    lldb/test/API/commands/trace/TestTraceStartStop.py

Removed: 
    


################################################################################
diff  --git a/lldb/include/lldb/Target/TraceCursor.h b/lldb/include/lldb/Target/TraceCursor.h
index 63278f315a2d0..3126c9fd6ac39 100644
--- a/lldb/include/lldb/Target/TraceCursor.h
+++ b/lldb/include/lldb/Target/TraceCursor.h
@@ -271,6 +271,15 @@ class TraceCursor {
   ///     The approximate wall clock time for the trace item, or \a llvm::None
   ///     if not available.
   virtual llvm::Optional<double> GetWallClockTime() const = 0;
+
+  /// Get some metadata associated with a synchronization point event. As
+  /// 
diff erent trace technologies might have 
diff erent values for this,
+  /// we return a string for flexibility.
+  ///
+  /// \return
+  ///     A string representing some metadata associated with a
+  ///     \a eTraceEventSyncPoint event. \b None if no metadata is available.
+  virtual llvm::Optional<std::string> GetSyncPointMetadata() const = 0;
   /// \}
 
 protected:

diff  --git a/lldb/include/lldb/Target/TraceDumper.h b/lldb/include/lldb/Target/TraceDumper.h
index ab3f77916751d..2936cdcfd7c54 100644
--- a/lldb/include/lldb/Target/TraceDumper.h
+++ b/lldb/include/lldb/Target/TraceDumper.h
@@ -34,6 +34,8 @@ struct TraceDumperOptions {
   bool show_timestamps = false;
   /// Dump the events that happened between instructions.
   bool show_events = false;
+  /// Dump events and none of the instructions.
+  bool only_events = false;
   /// For each instruction, print the instruction kind.
   bool show_control_flow_kind = false;
   /// Optional custom id to start traversing from.
@@ -63,6 +65,7 @@ class TraceDumper {
     lldb::addr_t load_address;
     llvm::Optional<double> timestamp;
     llvm::Optional<uint64_t> hw_clock;
+    llvm::Optional<std::string> sync_point_metadata;
     llvm::Optional<llvm::StringRef> error;
     llvm::Optional<lldb::TraceEvent> event;
     llvm::Optional<SymbolInfo> symbol_info;

diff  --git a/lldb/include/lldb/lldb-enumerations.h b/lldb/include/lldb/lldb-enumerations.h
index 3b0005c1f0a49..264838e3c33ee 100644
--- a/lldb/include/lldb/lldb-enumerations.h
+++ b/lldb/include/lldb/lldb-enumerations.h
@@ -1161,15 +1161,18 @@ enum SaveCoreStyle {
 
 /// Events that might happen during a trace session.
 enum TraceEvent {
-  /// Tracing was disabled for some time due to a software trigger
+  /// Tracing was disabled for some time due to a software trigger.
   eTraceEventDisabledSW,
-  /// Tracing was disable for some time due to a hardware trigger
+  /// Tracing was disable for some time due to a hardware trigger.
   eTraceEventDisabledHW,
   /// Event due to CPU change for a thread. This event is also fired when
   /// suddenly it's not possible to identify the cpu of a given thread.
   eTraceEventCPUChanged,
-  /// Event due to a CPU HW clock tick
+  /// Event due to a CPU HW clock tick.
   eTraceEventHWClockTick,
+  /// The underlying tracing technology emitted a synchronization event used by
+  /// trace processors.
+  eTraceEventSyncPoint,
 };
 
 // Enum used to identify which kind of item a \a TraceCursor is pointing at

diff  --git a/lldb/source/Commands/CommandObjectThread.cpp b/lldb/source/Commands/CommandObjectThread.cpp
index 76457d14013ea..5d7f3c16f3894 100644
--- a/lldb/source/Commands/CommandObjectThread.cpp
+++ b/lldb/source/Commands/CommandObjectThread.cpp
@@ -2178,6 +2178,11 @@ class CommandObjectTraceDumpInstructions : public CommandObjectParsed {
         m_dumper_options.json = true;
         break;
       }
+      case 'E': {
+        m_dumper_options.only_events = true;
+        m_dumper_options.show_events = true;
+        break;
+      }
       case 'C': {
         m_continue = true;
         break;

diff  --git a/lldb/source/Commands/Options.td b/lldb/source/Commands/Options.td
index cc47db5753062..0b5791d95af5a 100644
--- a/lldb/source/Commands/Options.td
+++ b/lldb/source/Commands/Options.td
@@ -1160,6 +1160,10 @@ let Command = "thread trace dump instructions" in {
   def thread_trace_dump_instructions_show_events : Option<"events", "e">,
     Group<1>,
     Desc<"Dump the events that happened during the execution of the target.">;
+  def thread_trace_dump_instruction_only_events : Option<"only-events", "E">,
+    Group<1>,
+    Desc<"Dump only the events that happened during the execution of the "
+    "target. No instrutions are dumped.">;
   def thread_trace_dump_instructions_continue: Option<"continue", "C">,
     Group<1>,
     Desc<"Continue dumping instructions right where the previous invocation of "

diff  --git a/lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp b/lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
index 89119c68e9d26..38a54ff96828b 100644
--- a/lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
+++ b/lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp
@@ -16,18 +16,6 @@ using namespace lldb_private;
 using namespace lldb_private::trace_intel_pt;
 using namespace llvm;
 
-bool lldb_private::trace_intel_pt::IsLibiptError(int libipt_status) {
-  return libipt_status < 0;
-}
-
-bool lldb_private::trace_intel_pt::IsEndOfStream(int libipt_status) {
-  return libipt_status == -pte_eos;
-}
-
-bool lldb_private::trace_intel_pt::IsTscUnavailable(int libipt_status) {
-  return libipt_status == -pte_no_time;
-}
-
 char IntelPTError::ID;
 
 IntelPTError::IntelPTError(int libipt_error_code, lldb::addr_t address)
@@ -103,6 +91,11 @@ DecodedThread::GetInstructionLoadAddress(uint64_t item_index) const {
   return m_item_data[item_index].load_address;
 }
 
+lldb::addr_t
+DecodedThread::GetSyncPointOffsetByIndex(uint64_t item_index) const {
+  return m_psb_offsets.find(item_index)->second;
+}
+
 ThreadSP DecodedThread::GetThread() { return m_thread_sp; }
 
 DecodedThread::TraceItemStorage &
@@ -116,9 +109,17 @@ DecodedThread::CreateNewTraceItem(lldb::TraceItemKind kind) {
   return m_item_data.back();
 }
 
+void DecodedThread::NotifySyncPoint(lldb::addr_t psb_offset) {
+  m_psb_offsets.try_emplace(GetItemsCount(), psb_offset);
+  AppendEvent(lldb::eTraceEventSyncPoint);
+}
+
 void DecodedThread::NotifyTsc(TSC tsc) {
   if (m_last_tsc && (*m_last_tsc)->second.tsc == tsc)
     return;
+  if (m_last_tsc)
+    assert(tsc >= (*m_last_tsc)->second.tsc &&
+           "We can't have decreasing times");
 
   m_last_tsc =
       m_tscs.emplace(GetItemsCount(), TSCRange{tsc, 0, GetItemsCount()}).first;
@@ -179,9 +180,6 @@ void DecodedThread::AppendInstruction(const pt_insn &insn) {
 }
 
 void DecodedThread::AppendError(const IntelPTError &error) {
-  // End of stream shouldn't be a public error
-  if (IsEndOfStream(error.GetLibiptErrorCode()))
-    return;
   CreateNewTraceItem(lldb::eTraceItemKindError).error =
       ConstString(error.message()).AsCString();
 }
@@ -200,15 +198,6 @@ void DecodedThread::LibiptErrorsStats::RecordError(int libipt_error_code) {
   total_count++;
 }
 
-void DecodedThread::RecordTscError(int libipt_error_code) {
-  m_tsc_errors_stats.RecordError(libipt_error_code);
-}
-
-const DecodedThread::LibiptErrorsStats &
-DecodedThread::GetTscErrorsStats() const {
-  return m_tsc_errors_stats;
-}
-
 const DecodedThread::EventsStats &DecodedThread::GetEventsStats() const {
   return m_events_stats;
 }

diff  --git a/lldb/source/Plugins/Trace/intel-pt/DecodedThread.h b/lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
index 7f112a5fd2d19..4e845403532ca 100644
--- a/lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
+++ b/lldb/source/Plugins/Trace/intel-pt/DecodedThread.h
@@ -20,15 +20,6 @@
 namespace lldb_private {
 namespace trace_intel_pt {
 
-/// libipt status utils
-/// \{
-bool IsLibiptError(int libipt_status);
-
-bool IsEndOfStream(int libipt_status);
-
-bool IsTscUnavailable(int libipt_status);
-/// \}
-
 /// Class for representing a libipt decoding error.
 class IntelPTError : public llvm::ErrorInfo<IntelPTError> {
 public:
@@ -173,6 +164,10 @@ class DecodedThread : public std::enable_shared_from_this<DecodedThread> {
   ///   The requested cpu id, or \a LLDB_INVALID_CPU_ID if not available.
   lldb::cpu_id_t GetCPUByIndex(uint64_t item_index) const;
 
+  /// \return
+  ///   The PSB offset associated with the given item index.
+  lldb::addr_t GetSyncPointOffsetByIndex(uint64_t item_index) const;
+
   /// Get a maximal range of trace items that include the given \p item_index
   /// that have the same TSC value.
   ///
@@ -199,28 +194,12 @@ class DecodedThread : public std::enable_shared_from_this<DecodedThread> {
   ///     The load address of the instruction at the given index.
   lldb::addr_t GetInstructionLoadAddress(uint64_t item_index) const;
 
-  /// Return an object with statistics of the TSC decoding errors that happened.
-  /// A TSC error is not a fatal error and doesn't create gaps in the trace.
-  /// Instead we only keep track of them as statistics.
-  ///
-  /// \return
-  ///   An object with the statistics of TSC decoding errors.
-  const LibiptErrorsStats &GetTscErrorsStats() const;
-
   /// Return an object with statistics of the trace events that happened.
   ///
   /// \return
   ///   The stats object of all the events.
   const EventsStats &GetEventsStats() const;
 
-  /// Record an error decoding a TSC timestamp.
-  ///
-  /// See \a GetTscErrors() for more documentation.
-  ///
-  /// \param[in] libipt_error_code
-  ///   An error returned by the libipt library.
-  void RecordTscError(int libipt_error_code);
-
   /// The approximate size in bytes used by this instance,
   /// including all the already decoded instructions.
   size_t CalculateApproximateMemoryUsage() const;
@@ -235,6 +214,9 @@ class DecodedThread : public std::enable_shared_from_this<DecodedThread> {
   /// If this a new CPU, an event will be created.
   void NotifyCPU(lldb::cpu_id_t cpu_id);
 
+  /// Notify this object that a new PSB has been seen.
+  void NotifySyncPoint(lldb::addr_t psb_offset);
+
   /// Append a decoding error.
   void AppendError(const IntelPTError &error);
 
@@ -294,20 +276,21 @@ class DecodedThread : public std::enable_shared_from_this<DecodedThread> {
   llvm::Optional<std::map<uint64_t, NanosecondsRange>::iterator>
       m_last_nanoseconds = llvm::None;
 
-  // The cpu information is stored as a map. It maps `instruction index -> CPU`
+  // The cpu information is stored as a map. It maps `item index -> CPU`.
   // A CPU is associated with the next instructions that follow until the next
   // cpu is seen.
   std::map<uint64_t, lldb::cpu_id_t> m_cpus;
   /// This is the chronologically last CPU ID.
   llvm::Optional<uint64_t> m_last_cpu = llvm::None;
 
+  // The PSB offsets are stored as a map. It maps `item index -> psb offset`.
+  llvm::DenseMap<uint64_t, lldb::addr_t> m_psb_offsets;
+
   /// TSC -> nanos conversion utility.
   llvm::Optional<LinuxPerfZeroTscConversion> m_tsc_conversion;
 
   /// Statistics of all tracing events.
   EventsStats m_events_stats;
-  /// Statistics of libipt errors when decoding TSCs.
-  LibiptErrorsStats m_tsc_errors_stats;
   /// Total amount of time spent decoding.
   std::chrono::milliseconds m_total_decoding_time{0};
 };

diff  --git a/lldb/source/Plugins/Trace/intel-pt/LibiptDecoder.cpp b/lldb/source/Plugins/Trace/intel-pt/LibiptDecoder.cpp
index 8ac6c73150df5..8d621c1438307 100644
--- a/lldb/source/Plugins/Trace/intel-pt/LibiptDecoder.cpp
+++ b/lldb/source/Plugins/Trace/intel-pt/LibiptDecoder.cpp
@@ -14,8 +14,123 @@ using namespace lldb_private;
 using namespace lldb_private::trace_intel_pt;
 using namespace llvm;
 
-/// Class that decodes a raw buffer for a single thread using the low level
-/// libipt library.
+bool IsLibiptError(int status) { return status < 0; }
+
+bool IsEndOfStream(int status) {
+  assert(status >= 0 && "We can't check if we reached the end of the stream if "
+                        "we got a failed status");
+  return status & pts_eos;
+}
+
+bool HasEvents(int status) {
+  assert(status >= 0 && "We can't check for events if we got a failed status");
+  return status & pts_event_pending;
+}
+
+// RAII deleter for libipt's decoders
+auto InsnDecoderDeleter = [](pt_insn_decoder *decoder) {
+  pt_insn_free_decoder(decoder);
+};
+
+auto QueryDecoderDeleter = [](pt_query_decoder *decoder) {
+  pt_qry_free_decoder(decoder);
+};
+
+using PtInsnDecoderUP =
+    std::unique_ptr<pt_insn_decoder, decltype(InsnDecoderDeleter)>;
+
+using PtQueryDecoderUP =
+    std::unique_ptr<pt_query_decoder, decltype(QueryDecoderDeleter)>;
+
+/// Create a basic configuration object limited to a given buffer that can be
+/// used for many 
diff erent decoders.
+static Expected<pt_config> CreateBasicLibiptConfig(TraceIntelPT &trace_intel_pt,
+                                                   ArrayRef<uint8_t> buffer) {
+  Expected<pt_cpu> cpu_info = trace_intel_pt.GetCPUInfo();
+  if (!cpu_info)
+    return cpu_info.takeError();
+
+  pt_config config;
+  pt_config_init(&config);
+  config.cpu = *cpu_info;
+
+  int status = pt_cpu_errata(&config.errata, &config.cpu);
+  if (IsLibiptError(status))
+    return make_error<IntelPTError>(status);
+
+  // The libipt library does not modify the trace buffer, hence the
+  // following casts are safe.
+  config.begin = const_cast<uint8_t *>(buffer.data());
+  config.end = const_cast<uint8_t *>(buffer.data() + buffer.size());
+  return config;
+}
+
+/// Callback used by libipt for reading the process memory.
+///
+/// More information can be found in
+/// https://github.com/intel/libipt/blob/master/doc/man/pt_image_set_callback.3.md
+static int ReadProcessMemory(uint8_t *buffer, size_t size,
+                             const pt_asid * /* unused */, uint64_t pc,
+                             void *context) {
+  Process *process = static_cast<Process *>(context);
+
+  Status error;
+  int bytes_read = process->ReadMemory(pc, buffer, size, error);
+  if (error.Fail())
+    return -pte_nomap;
+  return bytes_read;
+}
+
+/// Set up the memory image callback for the given decoder.
+static Error SetupMemoryImage(pt_insn_decoder *decoder, Process &process) {
+  pt_image *image = pt_insn_get_image(decoder);
+
+  int status = pt_image_set_callback(image, ReadProcessMemory, &process);
+  if (IsLibiptError(status))
+    return make_error<IntelPTError>(status);
+  return Error::success();
+}
+
+/// Create an instruction decoder for the given buffer and the given process.
+static Expected<PtInsnDecoderUP>
+CreateInstructionDecoder(TraceIntelPT &trace_intel_pt, ArrayRef<uint8_t> buffer,
+                         Process &process) {
+  Expected<pt_config> config = CreateBasicLibiptConfig(trace_intel_pt, buffer);
+  if (!config)
+    return config.takeError();
+
+  pt_insn_decoder *decoder_ptr = pt_insn_alloc_decoder(&*config);
+  if (!decoder_ptr)
+    return make_error<IntelPTError>(-pte_nomem);
+
+  PtInsnDecoderUP decoder_up(decoder_ptr, InsnDecoderDeleter);
+
+  if (Error err = SetupMemoryImage(decoder_ptr, process))
+    return std::move(err);
+
+  return decoder_up;
+}
+
+/// Create a query decoder for the given buffer. The query decoder is the
+/// highest level decoder that operates directly on packets and doesn't perform
+/// actual instruction decoding. That's why it can be useful for inspecting a
+/// raw trace without pinning it to a particular process.
+static Expected<PtQueryDecoderUP>
+CreateQueryDecoder(TraceIntelPT &trace_intel_pt, ArrayRef<uint8_t> buffer) {
+  Expected<pt_config> config = CreateBasicLibiptConfig(trace_intel_pt, buffer);
+  if (!config)
+    return config.takeError();
+
+  pt_query_decoder *decoder_ptr = pt_qry_alloc_decoder(&*config);
+  if (!decoder_ptr)
+    return make_error<IntelPTError>(-pte_nomem);
+
+  return PtQueryDecoderUP(decoder_ptr, QueryDecoderDeleter);
+}
+
+/// Class that decodes a raw buffer for a single PSB block using the low level
+/// libipt library. It assumes that kernel and user mode instructions are not
+/// mixed in the same PSB block.
 ///
 /// Throughout this code, the status of the decoder will be used to identify
 /// events needed to be processed or errors in the decoder. The values can be
@@ -23,137 +138,151 @@ using namespace llvm;
 /// - positive or zero: not an error, but a list of bits signaling the status
 /// of the decoder, e.g. whether there are events that need to be decoded or
 /// not.
-class LibiptDecoder {
+class PSBBlockDecoder {
 public:
   /// \param[in] decoder
-  ///     A well configured decoder. Using the current state of that decoder,
-  ///     decoding will start at its next valid PSB. It's not assumed that the
-  ///     decoder is already pointing at a valid PSB.
+  ///     A decoder configured to start and end within the boundaries of the
+  ///     given \p psb_block.
+  ///
+  /// \param[in] psb_block
+  ///     The PSB block to decode.
+  ///
+  /// \param[in] next_block_ip
+  ///     The starting ip at the next PSB block of the same thread if available.
   ///
   /// \param[in] decoded_thread
   ///     A \a DecodedThread object where the decoded instructions will be
   ///     appended to. It might have already some instructions.
-  LibiptDecoder(pt_insn_decoder &decoder, DecodedThread &decoded_thread)
-      : m_decoder(decoder), m_decoded_thread(decoded_thread) {}
-
-  /// Decode all the instructions until the end of the trace.
-  /// The decoding flow is based on
-  /// https://github.com/intel/libipt/blob/master/doc/howto_libipt.md#the-instruction-flow-decode-loop.
-  void DecodeUntilEndOfTrace() {
-    // Multiple loops indicate gaps in the trace, which are found by the inner
-    // call to DecodeInstructionsAndEvents.
-    while (true) {
-      int status = pt_insn_sync_forward(&m_decoder);
-
-      if (IsLibiptError(status)) {
-        m_decoded_thread.AppendError(IntelPTError(status));
-        break;
-      }
+  PSBBlockDecoder(PtInsnDecoderUP &&decoder_up, const PSBBlock &psb_block,
+                  Optional<lldb::addr_t> next_block_ip,
+                  DecodedThread &decoded_thread)
+      : m_decoder_up(std::move(decoder_up)), m_psb_block(psb_block),
+        m_next_block_ip(next_block_ip), m_decoded_thread(decoded_thread) {}
+
+  /// \param[in] trace_intel_pt
+  ///     The main Trace object that own the PSB block.
+  ///
+  /// \param[in] decoder
+  ///     A decoder configured to start and end within the boundaries of the
+  ///     given \p psb_block.
+  ///
+  /// \param[in] psb_block
+  ///     The PSB block to decode.
+  ///
+  /// \param[in] buffer
+  ///     The raw intel pt trace for this block.
+  ///
+  /// \param[in] process
+  ///     The process to decode. It provides the memory image to use for
+  ///     decoding.
+  ///
+  /// \param[in] next_block_ip
+  ///     The starting ip at the next PSB block of the same thread if available.
+  ///
+  /// \param[in] decoded_thread
+  ///     A \a DecodedThread object where the decoded instructions will be
+  ///     appended to. It might have already some instructions.
+  static Expected<PSBBlockDecoder>
+  Create(TraceIntelPT &trace_intel_pt, const PSBBlock &psb_block,
+         ArrayRef<uint8_t> buffer, Process &process,
+         Optional<lldb::addr_t> next_block_ip, DecodedThread &decoded_thread) {
+    Expected<PtInsnDecoderUP> decoder_up =
+        CreateInstructionDecoder(trace_intel_pt, buffer, process);
+    if (!decoder_up)
+      return decoder_up.takeError();
 
-      DecodeInstructionsAndEvents(status);
-    }
+    return PSBBlockDecoder(std::move(*decoder_up), psb_block, next_block_ip,
+                           decoded_thread);
   }
 
-  /// Decode all the instructions that belong to the same PSB packet given its
-  /// offset.
-  void DecodePSB(uint64_t psb_offset) {
-    int status = pt_insn_sync_set(&m_decoder, psb_offset);
-    if (IsLibiptError(status)) {
-      m_decoded_thread.AppendError(IntelPTError(status));
-      return;
-    }
-    DecodeInstructionsAndEvents(status, /*stop_on_psb_change=*/true);
+  void DecodePSBBlock() {
+    int status = pt_insn_sync_forward(m_decoder_up.get());
+    assert(status >= 0 &&
+           "Synchronization shouldn't fail because this PSB was previously "
+           "decoded correctly.");
+
+    // We emit a TSC before a sync event to more easily associate a timestamp to
+    // the sync event. If present, the current block's TSC would be the first
+    // TSC we'll see when processing events.
+    if (m_psb_block.tsc)
+      m_decoded_thread.NotifyTsc(*m_psb_block.tsc);
+
+    m_decoded_thread.NotifySyncPoint(m_psb_block.psb_offset);
+
+    DecodeInstructionsAndEvents(status);
   }
 
 private:
-  /// Decode all the instructions and events until an error is found, the end
-  /// of the trace is reached, or optionally a new PSB is reached.
+  /// Decode all the instructions and events of the given PSB block.
   ///
   /// \param[in] status
   ///   The status that was result of synchronizing to the most recent PSB.
-  ///
-  /// \param[in] stop_on_psb_change
-  ///   If \b true, decoding stops if a 
diff erent PSB is reached.
-  void DecodeInstructionsAndEvents(int status,
-                                   bool stop_on_psb_change = false) {
-    uint64_t psb_offset;
-    pt_insn_get_sync_offset(&m_decoder,
-                            &psb_offset); // this can't fail because we got here
-
-    while (ProcessPTEvents(status)) {
-      if (stop_on_psb_change) {
-        uint64_t cur_psb_offset;
-        // this can't fail because we got here
-        pt_insn_get_sync_offset(&m_decoder, &cur_psb_offset);
-        if (cur_psb_offset != psb_offset)
-          break;
-      }
+  void DecodeInstructionsAndEvents(int status) {
+    pt_insn insn;
+    while (true) {
+      status = ProcessPTEvents(status);
+
+      if (IsLibiptError(status))
+        return;
+      else if (IsEndOfStream(status))
+        break;
 
       // The status returned by pt_insn_next will need to be processed
       // by ProcessPTEvents in the next loop if it is not an error.
-      pt_insn insn;
       std::memset(&insn, 0, sizeof insn);
-      if (IsLibiptError(status =
-                            pt_insn_next(&m_decoder, &insn, sizeof(insn)))) {
+      status = pt_insn_next(m_decoder_up.get(), &insn, sizeof(insn));
+
+      if (IsLibiptError(status)) {
         m_decoded_thread.AppendError(IntelPTError(status, insn.ip));
+        return;
+      } else if (IsEndOfStream(status)) {
         break;
       }
       m_decoded_thread.AppendInstruction(insn);
     }
-  }
 
-  /// Move the decoder forward to the next synchronization point (i.e. next PSB
-  /// packet).
-  ///
-  /// Once the decoder is at that synchronization point, it can start decoding
-  /// instructions.
-  ///
-  /// If errors are found, they will be appended to the trace.
-  ///
-  /// \return
-  ///   The libipt decoder status after moving to the next PSB. Negative if
-  ///   no PSB was found.
-  int FindNextSynchronizationPoint() {
-    // Try to sync the decoder. If it fails, then get the decoder_offset and
-    // try to sync again from the next synchronization point. If the
-    // new_decoder_offset is same as decoder_offset then we can't move to the
-    // next synchronization point. Otherwise, keep resyncing until either end
-    // of trace stream (eos) is reached or pt_insn_sync_forward() passes.
-    int status = pt_insn_sync_forward(&m_decoder);
-
-    // We make this call to record any synchronization errors.
-    if (IsLibiptError(status))
-      m_decoded_thread.AppendError(IntelPTError(status));
+    // We need to keep querying non-branching instructions until we hit the
+    // starting point of the next PSB. We won't see events at this point. This
+    // is based on
+    // https://github.com/intel/libipt/blob/master/doc/howto_libipt.md#parallel-decode
+    if (m_next_block_ip && insn.ip != 0) {
+      while (insn.ip != *m_next_block_ip) {
+        m_decoded_thread.AppendInstruction(insn);
 
-    return status;
+        status = pt_insn_next(m_decoder_up.get(), &insn, sizeof(insn));
+
+        if (IsLibiptError(status)) {
+          m_decoded_thread.AppendError(IntelPTError(status, insn.ip));
+          return;
+        }
+      }
+    }
   }
 
-  /// Before querying instructions, we need to query the events associated that
-  /// instruction e.g. timing events like ptev_tick, or paging events like
-  /// ptev_paging.
+  /// Before querying instructions, we need to query the events associated with
+  /// that instruction, e.g. timing and trace disablement events.
   ///
   /// \param[in] status
   ///   The status gotten from the previous instruction decoding or PSB
   ///   synchronization.
   ///
   /// \return
-  ///   \b true if no errors were found processing the events.
-  bool ProcessPTEvents(int status) {
-    while (status & pts_event_pending) {
+  ///     The pte_status after decoding events.
+  int ProcessPTEvents(int status) {
+    while (HasEvents(status)) {
       pt_event event;
-      status = pt_insn_event(&m_decoder, &event, sizeof(event));
+      std::memset(&event, 0, sizeof event);
+      status = pt_insn_event(m_decoder_up.get(), &event, sizeof(event));
+
       if (IsLibiptError(status)) {
         m_decoded_thread.AppendError(IntelPTError(status));
-        return false;
+        return status;
       }
 
       if (event.has_tsc)
         m_decoded_thread.NotifyTsc(event.tsc);
 
       switch (event.type) {
-      case ptev_enabled:
-        // The kernel started or resumed tracing the program.
-        break;
       case ptev_disabled:
         // The CPU paused tracing the program, e.g. due to ip filtering.
         m_decoded_thread.AppendEvent(lldb::eTraceEventDisabledHW);
@@ -174,89 +303,38 @@ class LibiptDecoder {
       }
     }
 
-    return true;
+    return status;
   }
 
 private:
-  pt_insn_decoder &m_decoder;
+  PtInsnDecoderUP m_decoder_up;
+  PSBBlock m_psb_block;
+  Optional<lldb::addr_t> m_next_block_ip;
   DecodedThread &m_decoded_thread;
 };
 
-/// Callback used by libipt for reading the process memory.
-///
-/// More information can be found in
-/// https://github.com/intel/libipt/blob/master/doc/man/pt_image_set_callback.3.md
-static int ReadProcessMemory(uint8_t *buffer, size_t size,
-                             const pt_asid * /* unused */, uint64_t pc,
-                             void *context) {
-  Process *process = static_cast<Process *>(context);
-
-  Status error;
-  int bytes_read = process->ReadMemory(pc, buffer, size, error);
-  if (error.Fail())
-    return -pte_nomap;
-  return bytes_read;
-}
-
-// RAII deleter for libipt's decoder
-auto DecoderDeleter = [](pt_insn_decoder *decoder) {
-  pt_insn_free_decoder(decoder);
-};
-
-using PtInsnDecoderUP =
-    std::unique_ptr<pt_insn_decoder, decltype(DecoderDeleter)>;
-
-static Expected<PtInsnDecoderUP>
-CreateInstructionDecoder(TraceIntelPT &trace_intel_pt,
-                         ArrayRef<uint8_t> buffer) {
-  Expected<pt_cpu> cpu_info = trace_intel_pt.GetCPUInfo();
-  if (!cpu_info)
-    return cpu_info.takeError();
-
-  pt_config config;
-  pt_config_init(&config);
-  config.cpu = *cpu_info;
-  int status = pte_ok;
-
-  if (IsLibiptError(status = pt_cpu_errata(&config.errata, &config.cpu)))
-    return make_error<IntelPTError>(status);
-
-  // The libipt library does not modify the trace buffer, hence the
-  // following casts are safe.
-  config.begin = const_cast<uint8_t *>(buffer.data());
-  config.end = const_cast<uint8_t *>(buffer.data() + buffer.size());
-
-  pt_insn_decoder *decoder_ptr = pt_insn_alloc_decoder(&config);
-  if (!decoder_ptr)
-    return make_error<IntelPTError>(-pte_nomem);
-
-  return PtInsnDecoderUP(decoder_ptr, DecoderDeleter);
-}
-
-static Error SetupMemoryImage(PtInsnDecoderUP &decoder_up, Process &process) {
-  pt_image *image = pt_insn_get_image(decoder_up.get());
-
-  int status = pte_ok;
-  if (IsLibiptError(
-          status = pt_image_set_callback(image, ReadProcessMemory, &process)))
-    return make_error<IntelPTError>(status);
-  return Error::success();
-}
-
 Error lldb_private::trace_intel_pt::DecodeSingleTraceForThread(
     DecodedThread &decoded_thread, TraceIntelPT &trace_intel_pt,
     ArrayRef<uint8_t> buffer) {
-  Expected<PtInsnDecoderUP> decoder_up =
-      CreateInstructionDecoder(trace_intel_pt, buffer);
-  if (!decoder_up)
-    return decoder_up.takeError();
-
-  if (Error err = SetupMemoryImage(*decoder_up,
-                                   *decoded_thread.GetThread()->GetProcess()))
-    return err;
+  Expected<std::vector<PSBBlock>> blocks =
+      SplitTraceIntoPSBBlock(trace_intel_pt, buffer, /*expect_tscs=*/false);
+  if (!blocks)
+    return blocks.takeError();
+
+  for (size_t i = 0; i < blocks->size(); i++) {
+    PSBBlock &block = blocks->at(i);
+
+    Expected<PSBBlockDecoder> decoder = PSBBlockDecoder::Create(
+        trace_intel_pt, block, buffer.slice(block.psb_offset, block.size),
+        *decoded_thread.GetThread()->GetProcess(),
+        i + 1 < blocks->size() ? blocks->at(i + 1).starting_ip : None,
+        decoded_thread);
+    if (!decoder)
+      return decoder.takeError();
+
+    decoder->DecodePSBBlock();
+  }
 
-  LibiptDecoder libipt_decoder(*decoder_up.get(), decoded_thread);
-  libipt_decoder.DecodeUntilEndOfTrace();
   return Error::success();
 }
 
@@ -264,92 +342,79 @@ Error lldb_private::trace_intel_pt::DecodeSystemWideTraceForThread(
     DecodedThread &decoded_thread, TraceIntelPT &trace_intel_pt,
     const DenseMap<lldb::cpu_id_t, llvm::ArrayRef<uint8_t>> &buffers,
     const std::vector<IntelPTThreadContinousExecution> &executions) {
-  DenseMap<lldb::cpu_id_t, LibiptDecoder> decoders;
-  for (auto &cpu_id_buffer : buffers) {
-    Expected<PtInsnDecoderUP> decoder_up =
-        CreateInstructionDecoder(trace_intel_pt, cpu_id_buffer.second);
-    if (!decoder_up)
-      return decoder_up.takeError();
-
-    if (Error err = SetupMemoryImage(*decoder_up,
-                                     *decoded_thread.GetThread()->GetProcess()))
-      return err;
-
-    decoders.try_emplace(cpu_id_buffer.first,
-                         LibiptDecoder(*decoder_up->release(), decoded_thread));
-  }
-
   bool has_seen_psbs = false;
   for (size_t i = 0; i < executions.size(); i++) {
     const IntelPTThreadContinousExecution &execution = executions[i];
 
     auto variant = execution.thread_execution.variant;
-    // We report the TSCs we are sure of
-    switch (variant) {
-    case ThreadContinuousExecution::Variant::Complete:
-      decoded_thread.NotifyTsc(execution.thread_execution.tscs.complete.start);
-      break;
-    case ThreadContinuousExecution::Variant::OnlyStart:
+
+    // We emit the first valid tsc
+    if (execution.psb_blocks.empty()) {
+      decoded_thread.NotifyTsc(execution.thread_execution.GetLowestKnownTSC());
+    } else {
+      assert(execution.psb_blocks.front().tsc &&
+             "per cpu decoding expects TSCs");
       decoded_thread.NotifyTsc(
-          execution.thread_execution.tscs.only_start.start);
-      break;
-    default:
-      break;
+          std::min(execution.thread_execution.GetLowestKnownTSC(),
+                   *execution.psb_blocks.front().tsc));
     }
 
+    // We then emit the CPU, which will be correctly associated with a tsc.
     decoded_thread.NotifyCPU(execution.thread_execution.cpu_id);
 
     // If we haven't seen a PSB yet, then it's fine not to show errors
     if (has_seen_psbs) {
-      if (execution.intelpt_subtraces.empty()) {
+      if (execution.psb_blocks.empty()) {
         decoded_thread.AppendCustomError(
-            formatv("Unable to find intel pt data for thread "
+            formatv("Unable to find intel pt data a thread "
                     "execution on cpu id = {0}",
                     execution.thread_execution.cpu_id)
                 .str());
       }
 
-      // If the first execution is incomplete because it doesn't have a previous
-      // context switch in its cpu, all good, otherwise we report the error.
-      if (variant == ThreadContinuousExecution::Variant::OnlyEnd ||
-          variant == ThreadContinuousExecution::Variant::HintedStart) {
+      // A hinted start is a non-initial execution that doesn't have a switch
+      // in. An only end is an initial execution that doesn't have a switch in.
+      // Any of those cases represent a gap because we have seen a PSB before.
+      if (variant == ThreadContinuousExecution::Variant::HintedStart ||
+          variant == ThreadContinuousExecution::Variant::OnlyEnd) {
         decoded_thread.AppendCustomError(
-            formatv("Unable to find the context switch in for the thread "
-                    "execution starting on cpu id = {0}",
+            formatv("Unable to find the context switch in for a thread "
+                    "execution on cpu id = {0}",
                     execution.thread_execution.cpu_id)
                 .str());
       }
     }
 
-    LibiptDecoder &decoder =
-        decoders.find(execution.thread_execution.cpu_id)->second;
-    for (const IntelPTThreadSubtrace &intel_pt_execution :
-         execution.intelpt_subtraces) {
-      has_seen_psbs = true;
-      decoder.DecodePSB(intel_pt_execution.psb_offset);
-    }
+    for (size_t j = 0; j < execution.psb_blocks.size(); j++) {
+      const PSBBlock &psb_block = execution.psb_blocks[j];
+
+      Expected<PSBBlockDecoder> decoder = PSBBlockDecoder::Create(
+          trace_intel_pt, psb_block,
+          buffers.lookup(executions[i].thread_execution.cpu_id)
+              .slice(psb_block.psb_offset, psb_block.size),
+          *decoded_thread.GetThread()->GetProcess(),
+          j + 1 < execution.psb_blocks.size()
+              ? execution.psb_blocks[j + 1].starting_ip
+              : None,
+          decoded_thread);
+      if (!decoder)
+        return decoder.takeError();
 
-    // We report the TSCs we are sure of
-    switch (variant) {
-    case ThreadContinuousExecution::Variant::Complete:
-      decoded_thread.NotifyTsc(execution.thread_execution.tscs.complete.end);
-      break;
-    case ThreadContinuousExecution::Variant::OnlyEnd:
-      decoded_thread.NotifyTsc(execution.thread_execution.tscs.only_end.end);
-      break;
-    default:
-      break;
+      has_seen_psbs = true;
+      decoder->DecodePSBBlock();
     }
 
     // If we haven't seen a PSB yet, then it's fine not to show errors
     if (has_seen_psbs) {
-      // If the last execution is incomplete because it doesn't have a following
-      // context switch in its cpu, all good.
-      if ((variant == ThreadContinuousExecution::Variant::OnlyStart &&
-           i + 1 != executions.size()) ||
-          variant == ThreadContinuousExecution::Variant::HintedEnd) {
+      // A hinted end is a non-ending execution that doesn't have a switch out.
+      // An only start is an ending execution that doesn't have a switch out.
+      // Any of those cases represent a gap if we still have executions to
+      // process and we have seen a PSB before.
+      if (i + 1 != executions.size() &&
+          (variant == ThreadContinuousExecution::Variant::OnlyStart ||
+           variant == ThreadContinuousExecution::Variant::HintedEnd)) {
         decoded_thread.AppendCustomError(
-            formatv("Unable to find the context switch out for the thread "
+            formatv("Unable to find the context switch out for a thread "
                     "execution on cpu id = {0}",
                     execution.thread_execution.cpu_id)
                 .str());
@@ -365,60 +430,109 @@ bool IntelPTThreadContinousExecution::operator<(
   // PSB packet, which is a valid TSC. Otherwise, We query the thread execution
   // itself for some tsc.
   auto get_tsc = [](const IntelPTThreadContinousExecution &exec) {
-    return exec.intelpt_subtraces.empty()
-               ? exec.thread_execution.GetLowestKnownTSC()
-               : exec.intelpt_subtraces.front().tsc;
+    return exec.psb_blocks.empty() ? exec.thread_execution.GetLowestKnownTSC()
+                                   : exec.psb_blocks.front().tsc;
   };
 
   return get_tsc(*this) < get_tsc(o);
 }
 
-Expected<std::vector<IntelPTThreadSubtrace>>
-lldb_private::trace_intel_pt::SplitTraceInContinuousExecutions(
-    TraceIntelPT &trace_intel_pt, llvm::ArrayRef<uint8_t> buffer) {
-  Expected<PtInsnDecoderUP> decoder_up =
-      CreateInstructionDecoder(trace_intel_pt, buffer);
+Expected<std::vector<PSBBlock>>
+lldb_private::trace_intel_pt::SplitTraceIntoPSBBlock(
+    TraceIntelPT &trace_intel_pt, llvm::ArrayRef<uint8_t> buffer,
+    bool expect_tscs) {
+  // This follows
+  // https://github.com/intel/libipt/blob/master/doc/howto_libipt.md#parallel-decode
+
+  Expected<PtQueryDecoderUP> decoder_up =
+      CreateQueryDecoder(trace_intel_pt, buffer);
   if (!decoder_up)
     return decoder_up.takeError();
 
-  pt_insn_decoder *decoder = decoder_up.get().get();
+  pt_query_decoder *decoder = decoder_up.get().get();
 
-  std::vector<IntelPTThreadSubtrace> executions;
+  std::vector<PSBBlock> executions;
 
-  int status = pte_ok;
-  while (!IsLibiptError(status = pt_insn_sync_forward(decoder))) {
-    uint64_t tsc;
-    if (IsLibiptError(pt_insn_time(decoder, &tsc, nullptr, nullptr)))
-      return createStringError(inconvertibleErrorCode(),
-                               "intel pt trace doesn't have TSC timestamps");
+  while (true) {
+    uint64_t maybe_ip = LLDB_INVALID_ADDRESS;
+    int decoding_status = pt_qry_sync_forward(decoder, &maybe_ip);
+    if (IsLibiptError(decoding_status))
+      break;
 
     uint64_t psb_offset;
-    pt_insn_get_sync_offset(decoder,
-                            &psb_offset); // this can't fail because we got here
+    int offset_status = pt_qry_get_sync_offset(decoder, &psb_offset);
+    assert(offset_status >= 0 &&
+           "This can't fail because we were able to synchronize");
+
+    Optional<uint64_t> ip;
+    if (!(pts_ip_suppressed & decoding_status))
+      ip = maybe_ip;
+
+    Optional<uint64_t> tsc;
+    // Now we fetch the first TSC that comes after the PSB.
+    while (HasEvents(decoding_status)) {
+      pt_event event;
+      decoding_status = pt_qry_event(decoder, &event, sizeof(event));
+      if (IsLibiptError(decoding_status))
+        break;
+      if (event.has_tsc) {
+        tsc = event.tsc;
+        break;
+      }
+    }
+    if (IsLibiptError(decoding_status)) {
+      // We continue to the next PSB. This effectively merges this PSB with the
+      // previous one, and that should be fine because this PSB might be the
+      // direct continuation of the previous thread and it's better to show an
+      // error in the decoded thread than to hide it. If this is the first PSB,
+      // we are okay losing it. Besides that, an error at processing events
+      // means that we wouldn't be able to get any instruction out of it.
+      continue;
+    }
+
+    if (expect_tscs && !tsc)
+      return createStringError(inconvertibleErrorCode(),
+                               "Found a PSB without TSC.");
 
     executions.push_back({
         psb_offset,
         tsc,
+        0,
+        ip,
     });
   }
+  if (!executions.empty()) {
+    // We now adjust the sizes of each block
+    executions.back().size = buffer.size() - executions.back().psb_offset;
+    for (int i = (int)executions.size() - 2; i >= 0; i--) {
+      executions[i].size =
+          executions[i + 1].psb_offset - executions[i].psb_offset;
+    }
+  }
   return executions;
 }
 
 Expected<Optional<uint64_t>>
 lldb_private::trace_intel_pt::FindLowestTSCInTrace(TraceIntelPT &trace_intel_pt,
                                                    ArrayRef<uint8_t> buffer) {
-  Expected<PtInsnDecoderUP> decoder_up =
-      CreateInstructionDecoder(trace_intel_pt, buffer);
+  Expected<PtQueryDecoderUP> decoder_up =
+      CreateQueryDecoder(trace_intel_pt, buffer);
   if (!decoder_up)
     return decoder_up.takeError();
 
-  pt_insn_decoder *decoder = decoder_up.get().get();
-  int status = pte_ok;
-  if (IsLibiptError(status = pt_insn_sync_forward(decoder)))
+  pt_query_decoder *decoder = decoder_up.get().get();
+  uint64_t ip = LLDB_INVALID_ADDRESS;
+  int status = pt_qry_sync_forward(decoder, &ip);
+  if (IsLibiptError(status))
     return None;
 
-  uint64_t tsc;
-  if (IsLibiptError(pt_insn_time(decoder, &tsc, nullptr, nullptr)))
-    return None;
-  return tsc;
+  while (HasEvents(status)) {
+    pt_event event;
+    status = pt_qry_event(decoder, &event, sizeof(event));
+    if (IsLibiptError(status))
+      return None;
+    if (event.has_tsc)
+      return event.tsc;
+  }
+  return None;
 }

diff  --git a/lldb/source/Plugins/Trace/intel-pt/LibiptDecoder.h b/lldb/source/Plugins/Trace/intel-pt/LibiptDecoder.h
index 8f1091ea04d9a..8676a0b6d29e1 100644
--- a/lldb/source/Plugins/Trace/intel-pt/LibiptDecoder.h
+++ b/lldb/source/Plugins/Trace/intel-pt/LibiptDecoder.h
@@ -17,15 +17,22 @@
 namespace lldb_private {
 namespace trace_intel_pt {
 
-/// This struct represents a point in the intel pt trace that the decoder can
-/// start decoding from without errors.
-struct IntelPTThreadSubtrace {
+/// This struct represents a contiguous section of a trace that starts at a PSB
+/// and ends right before the next PSB or the end of the trace.
+struct PSBBlock {
   /// The memory offset of a PSB packet that is a synchronization point for the
   /// decoder. A decoder normally looks first for a PSB packet and then it
   /// starts decoding.
   uint64_t psb_offset;
   /// The timestamp associated with the PSB packet above.
-  uint64_t tsc;
+  llvm::Optional<uint64_t> tsc;
+  /// Size in bytes of this block
+  uint64_t size;
+  /// The first ip for this PSB block.
+  /// This is \a None if tracing was disabled when the PSB block was emitted.
+  /// This means that eventually there's be an enablement event that will come
+  /// with an ip.
+  llvm::Optional<lldb::addr_t> starting_ip;
 };
 
 /// This struct represents a continuous execution of a thread in a cpu,
@@ -33,7 +40,7 @@ struct IntelPTThreadSubtrace {
 /// that belong to this execution.
 struct IntelPTThreadContinousExecution {
   ThreadContinuousExecution thread_execution;
-  std::vector<IntelPTThreadSubtrace> intelpt_subtraces;
+  std::vector<PSBBlock> psb_blocks;
 
   IntelPTThreadContinousExecution(
       const ThreadContinuousExecution &thread_execution)
@@ -91,12 +98,15 @@ llvm::Error DecodeSystemWideTraceForThread(
 ///   The intel pt buffer that belongs to a single thread or to a single cpu
 ///   core.
 ///
+/// \param[in] expect_tscs
+///   If \b true, an error is return if a packet without TSC is found.
+///
 /// \return
 ///   A list of continuous executions sorted by time, or an \a llvm::Error in
 ///   case of failures.
-llvm::Expected<std::vector<IntelPTThreadSubtrace>>
-SplitTraceInContinuousExecutions(TraceIntelPT &trace_intel_pt,
-                                 llvm::ArrayRef<uint8_t> buffer);
+llvm::Expected<std::vector<PSBBlock>>
+SplitTraceIntoPSBBlock(TraceIntelPT &trace_intel_pt,
+                       llvm::ArrayRef<uint8_t> buffer, bool expect_tscs);
 
 /// Find the lowest TSC in the given trace.
 ///

diff  --git a/lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp b/lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp
index 3969881bc0eb0..fe734c0375dfe 100644
--- a/lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp
+++ b/lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp
@@ -137,3 +137,9 @@ bool TraceCursorIntelPT::HasId(lldb::user_id_t id) const {
 }
 
 user_id_t TraceCursorIntelPT::GetId() const { return m_pos; }
+
+Optional<std::string> TraceCursorIntelPT::GetSyncPointMetadata() const {
+  return formatv("offset = 0x{0:x}",
+                 m_decoded_thread_sp->GetSyncPointOffsetByIndex(m_pos))
+      .str();
+}

diff  --git a/lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.h b/lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.h
index 2e002bbfd22c6..e1159a06c54d7 100644
--- a/lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.h
+++ b/lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.h
@@ -47,6 +47,8 @@ class TraceCursorIntelPT : public TraceCursor {
 
   llvm::Optional<double> GetWallClockTime() const override;
 
+  llvm::Optional<std::string> GetSyncPointMetadata() const override;
+
 private:
   /// Clear the current TSC and nanoseconds ranges if after moving they are not
   /// valid anymore.

diff  --git a/lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp b/lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp
index 5f57480a04da5..0a3bc64098b50 100644
--- a/lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp
+++ b/lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp
@@ -284,20 +284,6 @@ void TraceIntelPT::DumpTraceInfo(Thread &thread, Stream &s, bool verbose,
     s.Format("    Total number of unattributed PSB blocks found: {0}\n",
              storage.multicpu_decoder->GetUnattributedPSBBlocksCount());
   }
-
-  // Errors
-  {
-    s << "\n  Errors:\n";
-    const DecodedThread::LibiptErrorsStats &tsc_errors_stats =
-        decoded_thread_sp->GetTscErrorsStats();
-    s.Format("    Number of TSC decoding errors: {0}\n",
-             tsc_errors_stats.total_count);
-    for (const auto &error_message_to_count :
-         tsc_errors_stats.libipt_errors_counts) {
-      s.Format("      {0}: {1}\n", error_message_to_count.first,
-               error_message_to_count.second);
-    }
-  }
 }
 
 void TraceIntelPT::DumpTraceInfoAsJson(Thread &thread, Stream &s,
@@ -372,21 +358,8 @@ void TraceIntelPT::DumpTraceInfoAsJson(Thread &thread, Stream &s,
             "PSBBlocks",
             storage.multicpu_decoder->GePSBBlocksCountForThread(tid));
       }
-
-      // Errors
-      const DecodedThread::LibiptErrorsStats &tsc_errors_stats =
-          decoded_thread_sp->GetTscErrorsStats();
-      json_str.attributeObject("errorItems", [&] {
-        json_str.attribute("total", tsc_errors_stats.total_count);
-        json_str.attributeObject("individualErrors", [&] {
-          for (const auto &error_message_to_count :
-               tsc_errors_stats.libipt_errors_counts) {
-            json_str.attribute(error_message_to_count.first,
-                               error_message_to_count.second);
-          }
-        });
-      });
     });
+
     json_str.attributeObject("globalStats", [&] {
       json_str.attributeObject("timingInSeconds", [&] {
         GetTimer().ForGlobal().ForEachTimedTask(

diff  --git a/lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCpuDecoder.cpp b/lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCpuDecoder.cpp
index b52d5c0e9c566..0ed7c3ac8efdd 100644
--- a/lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCpuDecoder.cpp
+++ b/lldb/source/Plugins/Trace/intel-pt/TraceIntelPTMultiCpuDecoder.cpp
@@ -89,23 +89,23 @@ Expected<DecodedThreadSP> TraceIntelPTMultiCpuDecoder::Decode(Thread &thread) {
       });
 }
 
-static Expected<std::vector<IntelPTThreadSubtrace>>
-GetIntelPTSubtracesForCpu(TraceIntelPT &trace, cpu_id_t cpu_id) {
-  std::vector<IntelPTThreadSubtrace> intel_pt_subtraces;
+static Expected<std::vector<PSBBlock>> GetPSBBlocksForCPU(TraceIntelPT &trace,
+                                                          cpu_id_t cpu_id) {
+  std::vector<PSBBlock> psb_blocks;
   Error err = trace.OnCpuBinaryDataRead(
       cpu_id, IntelPTDataKinds::kIptTrace,
       [&](ArrayRef<uint8_t> data) -> Error {
-        Expected<std::vector<IntelPTThreadSubtrace>> split_trace =
-            SplitTraceInContinuousExecutions(trace, data);
+        Expected<std::vector<PSBBlock>> split_trace =
+            SplitTraceIntoPSBBlock(trace, data, /*expect_tscs=*/true);
         if (!split_trace)
           return split_trace.takeError();
 
-        intel_pt_subtraces = std::move(*split_trace);
+        psb_blocks = std::move(*split_trace);
         return Error::success();
       });
   if (err)
     return std::move(err);
-  return intel_pt_subtraces;
+  return psb_blocks;
 }
 
 Expected<DenseMap<lldb::tid_t, std::vector<IntelPTThreadContinousExecution>>>
@@ -124,24 +124,24 @@ TraceIntelPTMultiCpuDecoder::DoCorrelateContextSwitchesAndIntelPtTraces() {
   LinuxPerfZeroTscConversion tsc_conversion = *conv_opt;
 
   for (cpu_id_t cpu_id : trace_sp->GetTracedCpus()) {
-    Expected<std::vector<IntelPTThreadSubtrace>> intel_pt_subtraces =
-        GetIntelPTSubtracesForCpu(*trace_sp, cpu_id);
-    if (!intel_pt_subtraces)
-      return intel_pt_subtraces.takeError();
+    Expected<std::vector<PSBBlock>> psb_blocks =
+        GetPSBBlocksForCPU(*trace_sp, cpu_id);
+    if (!psb_blocks)
+      return psb_blocks.takeError();
 
-    m_total_psb_blocks += intel_pt_subtraces->size();
+    m_total_psb_blocks += psb_blocks->size();
     // We'll be iterating through the thread continuous executions and the intel
     // pt subtraces sorted by time.
-    auto it = intel_pt_subtraces->begin();
+    auto it = psb_blocks->begin();
     auto on_new_thread_execution =
         [&](const ThreadContinuousExecution &thread_execution) {
           IntelPTThreadContinousExecution execution(thread_execution);
 
-          for (; it != intel_pt_subtraces->end() &&
-                 it->tsc < thread_execution.GetEndTSC();
+          for (; it != psb_blocks->end() &&
+                 *it->tsc < thread_execution.GetEndTSC();
                it++) {
-            if (it->tsc > thread_execution.GetStartTSC()) {
-              execution.intelpt_subtraces.push_back(*it);
+            if (*it->tsc > thread_execution.GetStartTSC()) {
+              execution.psb_blocks.push_back(*it);
             } else {
               m_unattributed_psb_blocks++;
             }
@@ -163,7 +163,7 @@ TraceIntelPTMultiCpuDecoder::DoCorrelateContextSwitchesAndIntelPtTraces() {
     if (err)
       return std::move(err);
 
-    m_unattributed_psb_blocks += intel_pt_subtraces->end() - it;
+    m_unattributed_psb_blocks += psb_blocks->end() - it;
   }
   // We now sort the executions of each thread to have them ready for
   // instruction decoding
@@ -224,7 +224,7 @@ TraceIntelPTMultiCpuDecoder::GePSBBlocksCountForThread(lldb::tid_t tid) const {
   if (it == m_continuous_executions_per_thread->end())
     return 0;
   for (const IntelPTThreadContinousExecution &execution : it->second)
-    count += execution.intelpt_subtraces.size();
+    count += execution.psb_blocks.size();
   return count;
 }
 

diff  --git a/lldb/source/Target/TraceCursor.cpp b/lldb/source/Target/TraceCursor.cpp
index de3f9bf1b33d5..b85cc750a8a41 100644
--- a/lldb/source/Target/TraceCursor.cpp
+++ b/lldb/source/Target/TraceCursor.cpp
@@ -52,6 +52,8 @@ const char *TraceCursor::EventKindToString(lldb::TraceEvent event_kind) {
     return "CPU core changed";
   case lldb::eTraceEventHWClockTick:
     return "HW clock tick";
+  case lldb::eTraceEventSyncPoint:
+    return "trace synchronization point";
   }
   llvm_unreachable("Fully covered switch above");
 }

diff  --git a/lldb/source/Target/TraceDumper.cpp b/lldb/source/Target/TraceDumper.cpp
index f25b5d5939d5b..152feb65a7971 100644
--- a/lldb/source/Target/TraceDumper.cpp
+++ b/lldb/source/Target/TraceDumper.cpp
@@ -148,6 +148,9 @@ class OutputWriterCLI : public TraceDumper::OutputWriter {
       case eTraceEventDisabledHW:
       case eTraceEventDisabledSW:
         break;
+      case eTraceEventSyncPoint:
+        m_s.Format(" [{0}]", item.sync_point_metadata);
+        break;
       }
     } else if (item.error) {
       m_s << "(error) " << *item.error;
@@ -192,6 +195,7 @@ class OutputWriterJSON : public TraceDumper::OutputWriter {
       "loadAddress": string decimal,
       "id": decimal,
       "hwClock"?: string decimal,
+      "syncPointMetadata"?: string,
       "timestamp_ns"?: string decimal,
       "module"?: string,
       "symbol"?: string,
@@ -224,6 +228,9 @@ class OutputWriterJSON : public TraceDumper::OutputWriter {
     case eTraceEventDisabledHW:
     case eTraceEventDisabledSW:
       break;
+    case eTraceEventSyncPoint:
+      m_j.attribute("syncPointMetadata", item.sync_point_metadata);
+      break;
     }
   }
 
@@ -392,9 +399,7 @@ Optional<lldb::user_id_t> TraceDumper::DumpInstructions(size_t count) {
     last_id = m_cursor_sp->GetId();
     TraceItem item = CreatRawTraceItem();
 
-    if (m_cursor_sp->IsEvent()) {
-      if (!m_options.show_events)
-        continue;
+    if (m_cursor_sp->IsEvent() && m_options.show_events) {
       item.event = m_cursor_sp->GetEventType();
       switch (*item.event) {
       case eTraceEventCPUChanged:
@@ -406,10 +411,15 @@ Optional<lldb::user_id_t> TraceDumper::DumpInstructions(size_t count) {
       case eTraceEventDisabledHW:
       case eTraceEventDisabledSW:
         break;
+      case eTraceEventSyncPoint:
+        item.sync_point_metadata = m_cursor_sp->GetSyncPointMetadata();
+        break;
       }
+      m_writer_up->TraceItem(item);
     } else if (m_cursor_sp->IsError()) {
       item.error = m_cursor_sp->GetError();
-    } else {
+      m_writer_up->TraceItem(item);
+    } else if (m_cursor_sp->IsInstruction() && !m_options.only_events) {
       insn_seen++;
       item.load_address = m_cursor_sp->GetLoadAddress();
 
@@ -426,8 +436,8 @@ Optional<lldb::user_id_t> TraceDumper::DumpInstructions(size_t count) {
         item.symbol_info = symbol_info;
         prev_symbol_info = symbol_info;
       }
+      m_writer_up->TraceItem(item);
     }
-    m_writer_up->TraceItem(item);
   }
   if (!m_cursor_sp->HasValue())
     m_writer_up->NoMoreData();

diff  --git a/lldb/test/API/commands/trace/TestTraceDumpInfo.py b/lldb/test/API/commands/trace/TestTraceDumpInfo.py
index b2f9092a3ab4d..1407cb3a234a0 100644
--- a/lldb/test/API/commands/trace/TestTraceDumpInfo.py
+++ b/lldb/test/API/commands/trace/TestTraceDumpInfo.py
@@ -38,22 +38,20 @@ def testDumpRawTraceSize(self):
 
   Trace technology: intel-pt
 
-  Total number of trace items: 23
+  Total number of trace items: 24
 
   Memory usage:
     Raw trace size: 4 KiB
-    Total approximate memory usage (excluding raw trace): 0.20 KiB
+    Total approximate memory usage (excluding raw trace): 0.21 KiB
     Average memory usage per item (excluding raw trace): 9.00 bytes
 
   Timing for this thread:
     Decoding instructions: ''', '''
 
   Events:
-    Number of individual events: 2
+    Number of individual events: 3
       software disabled tracing: 2
-
-  Errors:
-    Number of TSC decoding errors: 0'''],
+      trace synchronization point: 1'''],
             patterns=["Decoding instructions: \d.\d\ds"])
 
     def testDumpRawTraceSizeJSON(self):
@@ -66,23 +64,20 @@ def testDumpRawTraceSizeJSON(self):
   "traceTechnology": "intel-pt",
   "threadStats": {
     "tid": 3842849,
-    "traceItemsCount": 23,
+    "traceItemsCount": 24,
     "memoryUsage": {
-      "totalInBytes": "207",
+      "totalInBytes": "216",
       "avgPerItemInBytes": 9
     },
     "timingInSeconds": {
       "Decoding instructions": 0''', '''
     },
     "events": {
-      "totalCount": 2,
+      "totalCount": 3,
       "individualCounts": {
-        "software disabled tracing": 2
+        "software disabled tracing": 2,
+        "trace synchronization point": 1
       }
-    },
-    "errorItems": {
-      "total": 0,
-      "individualErrors": {}
     }
   },
   "globalStats": {

diff  --git a/lldb/test/API/commands/trace/TestTraceDumpInstructions.py b/lldb/test/API/commands/trace/TestTraceDumpInstructions.py
index d7cb77aafb4a2..d166f5dbd4aa2 100644
--- a/lldb/test/API/commands/trace/TestTraceDumpInstructions.py
+++ b/lldb/test/API/commands/trace/TestTraceDumpInstructions.py
@@ -34,32 +34,32 @@ def testRawDumpInstructionsInJSON(self):
             substrs=["intel-pt"])
 
         self.expect("thread trace dump instructions --raw --count 5 --forwards --json",
-            substrs=['''[{"id":0,"loadAddress":"0x400511"}''',
-                     '''{"id":2,"loadAddress":"0x400518"}''',
-                     '''{"id":3,"loadAddress":"0x40051f"}''',
-                     '''{"id":4,"loadAddress":"0x400529"}''',
-                     '''{"id":5,"loadAddress":"0x40052d"}'''])
+            substrs=['''[{"id":1,"loadAddress":"0x400511"}''',
+                     '''{"id":3,"loadAddress":"0x400518"}''',
+                     '''{"id":4,"loadAddress":"0x40051f"}''',
+                     '''{"id":5,"loadAddress":"0x400529"}''',
+                     '''{"id":6,"loadAddress":"0x40052d"}'''])
 
         self.expect("thread trace dump instructions --raw --count 5 --forwards --pretty-json",
             substrs=['''[
   {
-    "id": 0,
+    "id": 1,
     "loadAddress": "0x400511"
   },
   {
-    "id": 2,
+    "id": 3,
     "loadAddress": "0x400518"
   },
   {
-    "id": 3,
+    "id": 4,
     "loadAddress": "0x40051f"
   },
   {
-    "id": 4,
+    "id": 5,
     "loadAddress": "0x400529"
   },
   {
-    "id": 5,
+    "id": 6,
     "loadAddress": "0x40052d"
   }
 ]'''])
@@ -76,23 +76,23 @@ def testRawDumpInstructionsInJSONToFile(self):
         with open(outfile, "r") as out:
             self.assertEqual(out.read(), '''[
   {
-    "id": 0,
+    "id": 1,
     "loadAddress": "0x400511"
   },
   {
-    "id": 2,
+    "id": 3,
     "loadAddress": "0x400518"
   },
   {
-    "id": 3,
+    "id": 4,
     "loadAddress": "0x40051f"
   },
   {
-    "id": 4,
+    "id": 5,
     "loadAddress": "0x400529"
   },
   {
-    "id": 5,
+    "id": 6,
     "loadAddress": "0x40052d"
   }
 ]''')
@@ -104,67 +104,65 @@ def testRawDumpInstructions(self):
 
         self.expect("thread trace dump instructions --raw --count 21 --forwards",
             substrs=['''thread #1: tid = 3842849
-    0: 0x0000000000400511
-    2: 0x0000000000400518
-    3: 0x000000000040051f
-    4: 0x0000000000400529
-    5: 0x000000000040052d
-    6: 0x0000000000400521
-    7: 0x0000000000400525
-    8: 0x0000000000400529
-    9: 0x000000000040052d
-    10: 0x0000000000400521
-    11: 0x0000000000400525
-    12: 0x0000000000400529
-    13: 0x000000000040052d
-    14: 0x0000000000400521
-    15: 0x0000000000400525
-    16: 0x0000000000400529
-    17: 0x000000000040052d
-    18: 0x0000000000400521
-    19: 0x0000000000400525
-    20: 0x0000000000400529
-    21: 0x000000000040052d'''])
+    1: 0x0000000000400511
+    3: 0x0000000000400518
+    4: 0x000000000040051f
+    5: 0x0000000000400529
+    6: 0x000000000040052d
+    7: 0x0000000000400521
+    8: 0x0000000000400525
+    9: 0x0000000000400529
+    10: 0x000000000040052d
+    11: 0x0000000000400521
+    12: 0x0000000000400525
+    13: 0x0000000000400529
+    14: 0x000000000040052d
+    15: 0x0000000000400521
+    16: 0x0000000000400525
+    17: 0x0000000000400529
+    18: 0x000000000040052d
+    19: 0x0000000000400521
+    20: 0x0000000000400525
+    21: 0x0000000000400529
+    22: 0x000000000040052d'''])
 
         # We check if we can pass count and skip
         self.expect("thread trace dump instructions --count 5 --skip 6 --raw --forwards",
             substrs=['''thread #1: tid = 3842849
-    6: 0x0000000000400521
-    7: 0x0000000000400525
-    8: 0x0000000000400529
-    9: 0x000000000040052d
-    10: 0x0000000000400521'''])
+    6: 0x000000000040052d
+    7: 0x0000000000400521
+    8: 0x0000000000400525
+    9: 0x0000000000400529
+    10: 0x000000000040052d'''])
 
         self.expect("thread trace dump instructions --count 5 --skip 6 --raw",
             substrs=['''thread #1: tid = 3842849
-    16: 0x0000000000400529
-    15: 0x0000000000400525
-    14: 0x0000000000400521
-    13: 0x000000000040052d
-    12: 0x0000000000400529'''])
+    17: 0x0000000000400529
+    16: 0x0000000000400525
+    15: 0x0000000000400521
+    14: 0x000000000040052d
+    13: 0x0000000000400529'''])
 
         # We check if we can pass count and skip and instruction id in hex
         self.expect("thread trace dump instructions --count 5 --skip 6 --raw --id 0xA",
             substrs=['''thread #1: tid = 3842849
-    4: 0x0000000000400529
-    3: 0x000000000040051f
-    2: 0x0000000000400518
-    0: 0x0000000000400511
+    4: 0x000000000040051f
+    3: 0x0000000000400518
+    1: 0x0000000000400511
     no more data'''])
 
         # We check if we can pass count and skip and instruction id in decimal
         self.expect("thread trace dump instructions --count 5 --skip 6 --raw --id 10",
             substrs=['''thread #1: tid = 3842849
-    4: 0x0000000000400529
-    3: 0x000000000040051f
-    2: 0x0000000000400518
-    0: 0x0000000000400511
+    4: 0x000000000040051f
+    3: 0x0000000000400518
+    1: 0x0000000000400511
     no more data'''])
 
         # We check if we can access the thread by index id
         self.expect("thread trace dump instructions 1 --raw",
             substrs=['''thread #1: tid = 3842849
-    21: 0x000000000040052d'''])
+    22: 0x000000000040052d'''])
 
         # We check that we get an error when using an invalid thread index id
         self.expect("thread trace dump instructions 10", error=True,
@@ -179,30 +177,30 @@ def testDumpFullInstructionsWithMultipleThreads(self):
         self.expect("thread trace dump instructions 2 --count 2",
             substrs=['''thread #2: tid = 3842850
   a.out`main + 32 at main.cpp:4
-    21: 0x000000000040052d    jle    0x400521                  ; <+20> at main.cpp:5
-    20: 0x0000000000400529    cmpl   $0x3, -0x8(%rbp)'''])
+    22: 0x000000000040052d    jle    0x400521                  ; <+20> at main.cpp:5
+    21: 0x0000000000400529    cmpl   $0x3, -0x8(%rbp)'''])
 
         # We use custom --count and --skip, saving the command to history for later
         self.expect("thread trace dump instructions 2 --count 2 --skip 2", inHistory=True,
             substrs=['''thread #2: tid = 3842850
   a.out`main + 28 at main.cpp:4
-    20: 0x0000000000400529    cmpl   $0x3, -0x8(%rbp)
-    19: 0x0000000000400525    addl   $0x1, -0x8(%rbp)'''])
+    21: 0x0000000000400529    cmpl   $0x3, -0x8(%rbp)
+    20: 0x0000000000400525    addl   $0x1, -0x8(%rbp)'''])
 
         # We use a repeat command twice and ensure the previous count is used and the
         # start position moves with each command.
         self.expect("", inHistory=True,
             substrs=['''thread #2: tid = 3842850
   a.out`main + 20 at main.cpp:5
-    18: 0x0000000000400521    xorl   $0x1, -0x4(%rbp)
+    19: 0x0000000000400521    xorl   $0x1, -0x4(%rbp)
   a.out`main + 32 at main.cpp:4
-    17: 0x000000000040052d    jle    0x400521                  ; <+20> at main.cpp:5'''])
+    18: 0x000000000040052d    jle    0x400521                  ; <+20> at main.cpp:5'''])
 
         self.expect("", inHistory=True,
             substrs=['''thread #2: tid = 3842850
   a.out`main + 28 at main.cpp:4
-    16: 0x0000000000400529    cmpl   $0x3, -0x8(%rbp)
-    15: 0x0000000000400525    addl   $0x1, -0x8(%rbp'''])
+    17: 0x0000000000400529    cmpl   $0x3, -0x8(%rbp)
+    16: 0x0000000000400525    addl   $0x1, -0x8(%rbp'''])
 
     def testInvalidBounds(self):
         self.expect("trace load -v " +
@@ -212,10 +210,10 @@ def testInvalidBounds(self):
         self.expect("thread trace dump instructions --count 20 --forwards",
             substrs=['''thread #1: tid = 3842849
   a.out`main + 4 at main.cpp:2
-    0: 0x0000000000400511    movl   $0x0, -0x4(%rbp)
+    1: 0x0000000000400511    movl   $0x0, -0x4(%rbp)
   a.out`main + 11 at main.cpp:4
-    2: 0x0000000000400518    movl   $0x0, -0x8(%rbp)
-    3: 0x000000000040051f    jmp    0x400529                  ; <+28> at main.cpp:4'''])
+    3: 0x0000000000400518    movl   $0x0, -0x8(%rbp)
+    4: 0x000000000040051f    jmp    0x400529                  ; <+28> at main.cpp:4'''])
 
         # Should print no instructions if the position is out of bounds
         self.expect("thread trace dump instructions --skip 23",
@@ -231,8 +229,7 @@ def testWrongImage(self):
         self.expect("thread trace dump instructions --forwards",
             substrs=['''thread #1: tid = 3842849
     ...missing instructions
-    0: (error) no memory mapped at this address: 0x0000000000400511
-    1: (error) no memory mapped at this address: 0x0000000000400518'''])
+    1: (error) no memory mapped at this address: 0x0000000000400511'''])
 
     def testWrongCPU(self):
         self.expect("trace load " +
@@ -244,35 +241,35 @@ def testMultiFileTraceWithMissingModuleInJSON(self):
         self.expect("trace load " +
             os.path.join(self.getSourceDir(), "intelpt-trace-multi-file", "multi-file-no-ld.json"))
 
-        self.expect("thread trace dump instructions --count 4 --id 4 --forwards --pretty-json",
+        self.expect("thread trace dump instructions --count 4 --id 5 --forwards --pretty-json",
             substrs=['''[
   {
-    "id": 4,
+    "id": 5,
     "loadAddress": "0x40054b",
     "module": "a.out",
     "symbol": "foo()",
     "mnemonic": "jmp"
   },
   {
-    "id": 5,
+    "id": 6,
     "loadAddress": "0x400510",
     "module": "a.out",
     "symbol": null,
     "mnemonic": "pushq"
   },
   {
-    "id": 6,
+    "id": 7,
     "loadAddress": "0x400516",
     "module": "a.out",
     "symbol": null,
     "mnemonic": "jmpq"
   },
   {
-    "id": 7,
+    "id": 8,
     "error": "no memory mapped at this address: 0x00007ffff7df1950"
   },
   {
-    "id": 8,
+    "id": 10,
     "loadAddress": "0x400674",
     "module": "a.out",
     "symbol": "main",
@@ -287,17 +284,17 @@ def testMultiFileTraceWithMissingModuleInJSON(self):
                 substrs=['''[
   {
     "id": 20,
-    "loadAddress": "0x40069a",
+    "loadAddress": "0x400694",
     "module": "a.out",
-    "symbol": "main",
-    "mnemonic": "addl",
+    "symbol": "inline_function()",
+    "mnemonic": "movl",
     "source": "/home/wallace/llvm-sand/external/llvm-project/lldb/test/API/commands/trace/intelpt-trace-multi-file/main.cpp",
-    "line": 14,
+    "line": 6,
     "column": 0
   },
   {
     "id": 21,
-    "loadAddress": "0x40069c",
+    "loadAddress": "0x400697",
     "module": "a.out",
     "symbol": "main",
     "mnemonic": "movl",
@@ -306,21 +303,24 @@ def testMultiFileTraceWithMissingModuleInJSON(self):
     "column": 0
   },
   {
-    "id": 23,
-    "loadAddress": "0x40069f",
+    "id": 22,
+    "loadAddress": "0x40069a",
     "module": "a.out",
     "symbol": "main",
-    "mnemonic": "callq",
+    "mnemonic": "addl",
     "source": "/home/wallace/llvm-sand/external/llvm-project/lldb/test/API/commands/trace/intelpt-trace-multi-file/main.cpp",
-    "line": 16,
+    "line": 14,
     "column": 0
   },
   {
-    "id": 24,
-    "loadAddress": "0x400540",
+    "id": 23,
+    "loadAddress": "0x40069c",
     "module": "a.out",
-    "symbol": "foo()",
-    "mnemonic": "jmpq"
+    "symbol": "main",
+    "mnemonic": "movl",
+    "source": "/home/wallace/llvm-sand/external/llvm-project/lldb/test/API/commands/trace/intelpt-trace-multi-file/main.cpp",
+    "line": 14,
+    "column": 0
   }
 ]'''])
 
@@ -345,148 +345,148 @@ def testMultiFileTraceWithMissingModule(self):
         self.expect("thread trace dump instructions --count 50 --forwards",
             substrs=['''thread #1: tid = 815455
   a.out`main + 15 at main.cpp:10
-    0: 0x000000000040066f    callq  0x400540                  ; symbol stub for: foo()
+    1: 0x000000000040066f    callq  0x400540                  ; symbol stub for: foo()
   a.out`symbol stub for: foo()
-    2: 0x0000000000400540    jmpq   *0x200ae2(%rip)           ; _GLOBAL_OFFSET_TABLE_ + 40
-    3: 0x0000000000400546    pushq  $0x2
-    4: 0x000000000040054b    jmp    0x400510
+    3: 0x0000000000400540    jmpq   *0x200ae2(%rip)           ; _GLOBAL_OFFSET_TABLE_ + 40
+    4: 0x0000000000400546    pushq  $0x2
+    5: 0x000000000040054b    jmp    0x400510
   a.out`(none)
-    5: 0x0000000000400510    pushq  0x200af2(%rip)            ; _GLOBAL_OFFSET_TABLE_ + 8
-    6: 0x0000000000400516    jmpq   *0x200af4(%rip)           ; _GLOBAL_OFFSET_TABLE_ + 16
+    6: 0x0000000000400510    pushq  0x200af2(%rip)            ; _GLOBAL_OFFSET_TABLE_ + 8
+    7: 0x0000000000400516    jmpq   *0x200af4(%rip)           ; _GLOBAL_OFFSET_TABLE_ + 16
     ...missing instructions
-    7: (error) no memory mapped at this address: 0x00007ffff7df1950
+    8: (error) no memory mapped at this address: 0x00007ffff7df1950
   a.out`main + 20 at main.cpp:10
-    8: 0x0000000000400674    movl   %eax, -0xc(%rbp)
+    10: 0x0000000000400674    movl   %eax, -0xc(%rbp)
   a.out`main + 23 at main.cpp:12
-    10: 0x0000000000400677    movl   -0xc(%rbp), %eax
-    11: 0x000000000040067a    addl   $0x1, %eax
-    12: 0x000000000040067f    movl   %eax, -0xc(%rbp)
+    12: 0x0000000000400677    movl   -0xc(%rbp), %eax
+    13: 0x000000000040067a    addl   $0x1, %eax
+    14: 0x000000000040067f    movl   %eax, -0xc(%rbp)
   a.out`main + 34 [inlined] inline_function() at main.cpp:4
-    14: 0x0000000000400682    movl   $0x0, -0x4(%rbp)
+    16: 0x0000000000400682    movl   $0x0, -0x4(%rbp)
   a.out`main + 41 [inlined] inline_function() + 7 at main.cpp:5
-    15: 0x0000000000400689    movl   -0x4(%rbp), %eax
-    16: 0x000000000040068c    addl   $0x1, %eax
-    17: 0x0000000000400691    movl   %eax, -0x4(%rbp)
+    17: 0x0000000000400689    movl   -0x4(%rbp), %eax
+    18: 0x000000000040068c    addl   $0x1, %eax
+    19: 0x0000000000400691    movl   %eax, -0x4(%rbp)
   a.out`main + 52 [inlined] inline_function() + 18 at main.cpp:6
-    18: 0x0000000000400694    movl   -0x4(%rbp), %eax
+    20: 0x0000000000400694    movl   -0x4(%rbp), %eax
   a.out`main + 55 at main.cpp:14
-    19: 0x0000000000400697    movl   -0xc(%rbp), %ecx
-    20: 0x000000000040069a    addl   %eax, %ecx
-    21: 0x000000000040069c    movl   %ecx, -0xc(%rbp)
+    21: 0x0000000000400697    movl   -0xc(%rbp), %ecx
+    22: 0x000000000040069a    addl   %eax, %ecx
+    23: 0x000000000040069c    movl   %ecx, -0xc(%rbp)
   a.out`main + 63 at main.cpp:16
-    23: 0x000000000040069f    callq  0x400540                  ; symbol stub for: foo()
+    25: 0x000000000040069f    callq  0x400540                  ; symbol stub for: foo()
   a.out`symbol stub for: foo()
-    24: 0x0000000000400540    jmpq   *0x200ae2(%rip)           ; _GLOBAL_OFFSET_TABLE_ + 40
+    26: 0x0000000000400540    jmpq   *0x200ae2(%rip)           ; _GLOBAL_OFFSET_TABLE_ + 40
   libfoo.so`foo() at foo.cpp:3
-    25: 0x00007ffff7bd96e0    pushq  %rbp
-    26: 0x00007ffff7bd96e1    movq   %rsp, %rbp
+    27: 0x00007ffff7bd96e0    pushq  %rbp
+    28: 0x00007ffff7bd96e1    movq   %rsp, %rbp
   libfoo.so`foo() + 4 at foo.cpp:4
-    27: 0x00007ffff7bd96e4    subq   $0x10, %rsp
-    28: 0x00007ffff7bd96e8    callq  0x7ffff7bd95d0            ; symbol stub for: bar()
+    29: 0x00007ffff7bd96e4    subq   $0x10, %rsp
+    30: 0x00007ffff7bd96e8    callq  0x7ffff7bd95d0            ; symbol stub for: bar()
   libfoo.so`symbol stub for: bar()
-    29: 0x00007ffff7bd95d0    jmpq   *0x200a4a(%rip)           ; _GLOBAL_OFFSET_TABLE_ + 32
+    31: 0x00007ffff7bd95d0    jmpq   *0x200a4a(%rip)           ; _GLOBAL_OFFSET_TABLE_ + 32
   libbar.so`bar() at bar.cpp:1
-    30: 0x00007ffff79d7690    pushq  %rbp
-    31: 0x00007ffff79d7691    movq   %rsp, %rbp
+    32: 0x00007ffff79d7690    pushq  %rbp
+    33: 0x00007ffff79d7691    movq   %rsp, %rbp
   libbar.so`bar() + 4 at bar.cpp:2
-    32: 0x00007ffff79d7694    movl   $0x1, -0x4(%rbp)
+    34: 0x00007ffff79d7694    movl   $0x1, -0x4(%rbp)
   libbar.so`bar() + 11 at bar.cpp:3
-    33: 0x00007ffff79d769b    movl   -0x4(%rbp), %eax
-    34: 0x00007ffff79d769e    addl   $0x1, %eax
-    35: 0x00007ffff79d76a3    movl   %eax, -0x4(%rbp)
+    35: 0x00007ffff79d769b    movl   -0x4(%rbp), %eax
+    36: 0x00007ffff79d769e    addl   $0x1, %eax
+    37: 0x00007ffff79d76a3    movl   %eax, -0x4(%rbp)
   libbar.so`bar() + 22 at bar.cpp:4
-    36: 0x00007ffff79d76a6    movl   -0x4(%rbp), %eax
-    37: 0x00007ffff79d76a9    popq   %rbp
-    38: 0x00007ffff79d76aa    retq''',
+    38: 0x00007ffff79d76a6    movl   -0x4(%rbp), %eax
+    39: 0x00007ffff79d76a9    popq   %rbp
+    40: 0x00007ffff79d76aa    retq''',
   '''libfoo.so`foo() + 13 at foo.cpp:4
-    39: 0x00007ffff7bd96ed    movl   %eax, -0x4(%rbp)
+    41: 0x00007ffff7bd96ed    movl   %eax, -0x4(%rbp)
   libfoo.so`foo() + 16 at foo.cpp:5
-    40: 0x00007ffff7bd96f0    movl   -0x4(%rbp), %eax
-    41: 0x00007ffff7bd96f3    addl   $0x1, %eax
-    42: 0x00007ffff7bd96f8    movl   %eax, -0x4(%rbp)
+    42: 0x00007ffff7bd96f0    movl   -0x4(%rbp), %eax
+    43: 0x00007ffff7bd96f3    addl   $0x1, %eax
+    44: 0x00007ffff7bd96f8    movl   %eax, -0x4(%rbp)
   libfoo.so`foo() + 27 at foo.cpp:6
-    43: 0x00007ffff7bd96fb    movl   -0x4(%rbp), %eax
-    44: 0x00007ffff7bd96fe    addq   $0x10, %rsp
-    45: 0x00007ffff7bd9702    popq   %rbp
-    46: 0x00007ffff7bd9703    retq''',
+    45: 0x00007ffff7bd96fb    movl   -0x4(%rbp), %eax
+    46: 0x00007ffff7bd96fe    addq   $0x10, %rsp
+    47: 0x00007ffff7bd9702    popq   %rbp
+    48: 0x00007ffff7bd9703    retq''',
   '''a.out`main + 68 at main.cpp:16
-    47: 0x00000000004006a4    movl   -0xc(%rbp), %ecx
-    48: 0x00000000004006a7    addl   %eax, %ecx
-    49: 0x00000000004006a9    movl   %ecx, -0xc(%rbp)
+    49: 0x00000000004006a4    movl   -0xc(%rbp), %ecx
+    50: 0x00000000004006a7    addl   %eax, %ecx
+    51: 0x00000000004006a9    movl   %ecx, -0xc(%rbp)
     no more data'''])
 
 
         self.expect("thread trace dump instructions --count 50",
             substrs=['''thread #1: tid = 815455
   a.out`main + 73 at main.cpp:16
-    49: 0x00000000004006a9    movl   %ecx, -0xc(%rbp)
-    48: 0x00000000004006a7    addl   %eax, %ecx
-    47: 0x00000000004006a4    movl   -0xc(%rbp), %ecx
+    51: 0x00000000004006a9    movl   %ecx, -0xc(%rbp)
+    50: 0x00000000004006a7    addl   %eax, %ecx
+    49: 0x00000000004006a4    movl   -0xc(%rbp), %ecx
   libfoo.so`foo() + 35 at foo.cpp:6
-    46: 0x00007ffff7bd9703    retq''',
-    '''45: 0x00007ffff7bd9702    popq   %rbp
-    44: 0x00007ffff7bd96fe    addq   $0x10, %rsp
-    43: 0x00007ffff7bd96fb    movl   -0x4(%rbp), %eax
+    48: 0x00007ffff7bd9703    retq''',
+    '''47: 0x00007ffff7bd9702    popq   %rbp
+    46: 0x00007ffff7bd96fe    addq   $0x10, %rsp
+    45: 0x00007ffff7bd96fb    movl   -0x4(%rbp), %eax
   libfoo.so`foo() + 24 at foo.cpp:5
-    42: 0x00007ffff7bd96f8    movl   %eax, -0x4(%rbp)
-    41: 0x00007ffff7bd96f3    addl   $0x1, %eax
-    40: 0x00007ffff7bd96f0    movl   -0x4(%rbp), %eax
+    44: 0x00007ffff7bd96f8    movl   %eax, -0x4(%rbp)
+    43: 0x00007ffff7bd96f3    addl   $0x1, %eax
+    42: 0x00007ffff7bd96f0    movl   -0x4(%rbp), %eax
   libfoo.so`foo() + 13 at foo.cpp:4
-    39: 0x00007ffff7bd96ed    movl   %eax, -0x4(%rbp)
+    41: 0x00007ffff7bd96ed    movl   %eax, -0x4(%rbp)
   libbar.so`bar() + 26 at bar.cpp:4
-    38: 0x00007ffff79d76aa    retq''',
-    '''37: 0x00007ffff79d76a9    popq   %rbp
-    36: 0x00007ffff79d76a6    movl   -0x4(%rbp), %eax
+    40: 0x00007ffff79d76aa    retq''',
+    '''39: 0x00007ffff79d76a9    popq   %rbp
+    38: 0x00007ffff79d76a6    movl   -0x4(%rbp), %eax
   libbar.so`bar() + 19 at bar.cpp:3
-    35: 0x00007ffff79d76a3    movl   %eax, -0x4(%rbp)
-    34: 0x00007ffff79d769e    addl   $0x1, %eax
-    33: 0x00007ffff79d769b    movl   -0x4(%rbp), %eax
+    37: 0x00007ffff79d76a3    movl   %eax, -0x4(%rbp)
+    36: 0x00007ffff79d769e    addl   $0x1, %eax
+    35: 0x00007ffff79d769b    movl   -0x4(%rbp), %eax
   libbar.so`bar() + 4 at bar.cpp:2
-    32: 0x00007ffff79d7694    movl   $0x1, -0x4(%rbp)
+    34: 0x00007ffff79d7694    movl   $0x1, -0x4(%rbp)
   libbar.so`bar() + 1 at bar.cpp:1
-    31: 0x00007ffff79d7691    movq   %rsp, %rbp
-    30: 0x00007ffff79d7690    pushq  %rbp
+    33: 0x00007ffff79d7691    movq   %rsp, %rbp
+    32: 0x00007ffff79d7690    pushq  %rbp
   libfoo.so`symbol stub for: bar()
-    29: 0x00007ffff7bd95d0    jmpq   *0x200a4a(%rip)           ; _GLOBAL_OFFSET_TABLE_ + 32
+    31: 0x00007ffff7bd95d0    jmpq   *0x200a4a(%rip)           ; _GLOBAL_OFFSET_TABLE_ + 32
   libfoo.so`foo() + 8 at foo.cpp:4
-    28: 0x00007ffff7bd96e8    callq  0x7ffff7bd95d0            ; symbol stub for: bar()
-    27: 0x00007ffff7bd96e4    subq   $0x10, %rsp
+    30: 0x00007ffff7bd96e8    callq  0x7ffff7bd95d0            ; symbol stub for: bar()
+    29: 0x00007ffff7bd96e4    subq   $0x10, %rsp
   libfoo.so`foo() + 1 at foo.cpp:3
-    26: 0x00007ffff7bd96e1    movq   %rsp, %rbp
-    25: 0x00007ffff7bd96e0    pushq  %rbp
+    28: 0x00007ffff7bd96e1    movq   %rsp, %rbp
+    27: 0x00007ffff7bd96e0    pushq  %rbp
   a.out`symbol stub for: foo()
-    24: 0x0000000000400540    jmpq   *0x200ae2(%rip)           ; _GLOBAL_OFFSET_TABLE_ + 40
+    26: 0x0000000000400540    jmpq   *0x200ae2(%rip)           ; _GLOBAL_OFFSET_TABLE_ + 40
   a.out`main + 63 at main.cpp:16
-    23: 0x000000000040069f    callq  0x400540                  ; symbol stub for: foo()
+    25: 0x000000000040069f    callq  0x400540                  ; symbol stub for: foo()
   a.out`main + 60 at main.cpp:14
-    21: 0x000000000040069c    movl   %ecx, -0xc(%rbp)
-    20: 0x000000000040069a    addl   %eax, %ecx
-    19: 0x0000000000400697    movl   -0xc(%rbp), %ecx
+    23: 0x000000000040069c    movl   %ecx, -0xc(%rbp)
+    22: 0x000000000040069a    addl   %eax, %ecx
+    21: 0x0000000000400697    movl   -0xc(%rbp), %ecx
   a.out`main + 52 [inlined] inline_function() + 18 at main.cpp:6
-    18: 0x0000000000400694    movl   -0x4(%rbp), %eax
+    20: 0x0000000000400694    movl   -0x4(%rbp), %eax
   a.out`main + 49 [inlined] inline_function() + 15 at main.cpp:5
-    17: 0x0000000000400691    movl   %eax, -0x4(%rbp)
-    16: 0x000000000040068c    addl   $0x1, %eax
-    15: 0x0000000000400689    movl   -0x4(%rbp), %eax
+    19: 0x0000000000400691    movl   %eax, -0x4(%rbp)
+    18: 0x000000000040068c    addl   $0x1, %eax
+    17: 0x0000000000400689    movl   -0x4(%rbp), %eax
   a.out`main + 34 [inlined] inline_function() at main.cpp:4
-    14: 0x0000000000400682    movl   $0x0, -0x4(%rbp)
+    16: 0x0000000000400682    movl   $0x0, -0x4(%rbp)
   a.out`main + 31 at main.cpp:12
-    12: 0x000000000040067f    movl   %eax, -0xc(%rbp)
-    11: 0x000000000040067a    addl   $0x1, %eax
-    10: 0x0000000000400677    movl   -0xc(%rbp), %eax
+    14: 0x000000000040067f    movl   %eax, -0xc(%rbp)
+    13: 0x000000000040067a    addl   $0x1, %eax
+    12: 0x0000000000400677    movl   -0xc(%rbp), %eax
   a.out`main + 20 at main.cpp:10
-    8: 0x0000000000400674    movl   %eax, -0xc(%rbp)
+    10: 0x0000000000400674    movl   %eax, -0xc(%rbp)
     ...missing instructions
-    7: (error) no memory mapped at this address: 0x00007ffff7df1950
+    8: (error) no memory mapped at this address: 0x00007ffff7df1950
   a.out`(none)
-    6: 0x0000000000400516    jmpq   *0x200af4(%rip)           ; _GLOBAL_OFFSET_TABLE_ + 16
-    5: 0x0000000000400510    pushq  0x200af2(%rip)            ; _GLOBAL_OFFSET_TABLE_ + 8
+    7: 0x0000000000400516    jmpq   *0x200af4(%rip)           ; _GLOBAL_OFFSET_TABLE_ + 16
+    6: 0x0000000000400510    pushq  0x200af2(%rip)            ; _GLOBAL_OFFSET_TABLE_ + 8
   a.out`symbol stub for: foo() + 11
-    4: 0x000000000040054b    jmp    0x400510
-    3: 0x0000000000400546    pushq  $0x2
-    2: 0x0000000000400540    jmpq   *0x200ae2(%rip)           ; _GLOBAL_OFFSET_TABLE_ + 40
+    5: 0x000000000040054b    jmp    0x400510
+    4: 0x0000000000400546    pushq  $0x2
+    3: 0x0000000000400540    jmpq   *0x200ae2(%rip)           ; _GLOBAL_OFFSET_TABLE_ + 40
   a.out`main + 15 at main.cpp:10
-    0: 0x000000000040066f    callq  0x400540                  ; symbol stub for: foo()
+    1: 0x000000000040066f    callq  0x400540                  ; symbol stub for: foo()
     no more data'''])
 
         self.expect("thread trace dump instructions --skip 100 --forwards", inHistory=True,
@@ -499,13 +499,13 @@ def testMultiFileTraceWithMissingModule(self):
 
         self.expect("thread trace dump instructions --raw --all --forwards",
             substrs=['''thread #1: tid = 815455
-    0: 0x000000000040066f
-    2: 0x0000000000400540''',
-    '''6: 0x0000000000400516
+    1: 0x000000000040066f
+    3: 0x0000000000400540''',
+    '''7: 0x0000000000400516
     ...missing instructions
-    7: (error) no memory mapped at this address: 0x00007ffff7df1950
-    8: 0x0000000000400674''',
-    '''47: 0x00000000004006a4
-    48: 0x00000000004006a7
-    49: 0x00000000004006a9
+    8: (error) no memory mapped at this address: 0x00007ffff7df1950
+    10: 0x0000000000400674''',
+    '''49: 0x00000000004006a4
+    50: 0x00000000004006a7
+    51: 0x00000000004006a9
     no more data'''])

diff  --git a/lldb/test/API/commands/trace/TestTraceEvents.py b/lldb/test/API/commands/trace/TestTraceEvents.py
index 11d00f1d6309e..2267cf6f8f00e 100644
--- a/lldb/test/API/commands/trace/TestTraceEvents.py
+++ b/lldb/test/API/commands/trace/TestTraceEvents.py
@@ -16,6 +16,7 @@ def testCPUEvents(self):
     0: (event) HW clock tick [40450075477621505]
     1: (event) CPU core changed [new CPU=51]
     2: (event) HW clock tick [40450075477657246]
+    3: (event) trace synchronization point [offset = 0x0x1331]
   m.out`foo() + 65 at multi_thread.cpp:12:21'''])
 
       self.expect("thread trace dump instructions 3 -e --forward -c 5 -J",
@@ -50,55 +51,57 @@ def testPauseEvents(self):
       # We ensure that the paused events are printed correctly forward
       self.expect("thread trace dump instructions -e -f",
         patterns=[f'''thread #1: tid = .*
+    0: \(event\) trace synchronization point \[offset \= 0x0xec0\]
   a.out`main \+ 23 at main.cpp:12
-    0: {ADDRESS_REGEX}    movl .*
-    1: \(event\) software disabled tracing
-    2: {ADDRESS_REGEX}    addl .*
-    3: {ADDRESS_REGEX}    movl .*
-    4: \(event\) software disabled tracing
+    1: {ADDRESS_REGEX}    movl .*
+    2: \(event\) software disabled tracing
+    3: {ADDRESS_REGEX}    addl .*
+    4: {ADDRESS_REGEX}    movl .*
+    5: \(event\) software disabled tracing
   a.out`main \+ 34 \[inlined\] inline_function\(\) at main.cpp:4
-    5: {ADDRESS_REGEX}    movl .*
-  a.out`main \+ 41 \[inlined\] inline_function\(\) \+ 7 at main.cpp:5
     6: {ADDRESS_REGEX}    movl .*
-    7: {ADDRESS_REGEX}    addl .*
-    8: {ADDRESS_REGEX}    movl .*
-  a.out`main \+ 52 \[inlined\] inline_function\(\) \+ 18 at main.cpp:6
+  a.out`main \+ 41 \[inlined\] inline_function\(\) \+ 7 at main.cpp:5
+    7: {ADDRESS_REGEX}    movl .*
+    8: {ADDRESS_REGEX}    addl .*
     9: {ADDRESS_REGEX}    movl .*
-  a.out`main \+ 55 at main.cpp:14
+  a.out`main \+ 52 \[inlined\] inline_function\(\) \+ 18 at main.cpp:6
     10: {ADDRESS_REGEX}    movl .*
-    11: {ADDRESS_REGEX}    addl .*
-    12: {ADDRESS_REGEX}    movl .*
-    13: \(event\) software disabled tracing
+  a.out`main \+ 55 at main.cpp:14
+    11: {ADDRESS_REGEX}    movl .*
+    12: {ADDRESS_REGEX}    addl .*
+    13: {ADDRESS_REGEX}    movl .*
+    14: \(event\) software disabled tracing
   a.out`main \+ 63 at main.cpp:16
-    14: {ADDRESS_REGEX}    callq  .* ; symbol stub for: foo\(\)
-    15: \(event\) software disabled tracing
+    15: {ADDRESS_REGEX}    callq  .* ; symbol stub for: foo\(\)
+    16: \(event\) software disabled tracing
   a.out`symbol stub for: foo\(\)
-    16: {ADDRESS_REGEX}    jmpq'''])
+    17: {ADDRESS_REGEX}    jmpq'''])
 
       # We ensure that the paused events are printed correctly backward
-      self.expect("thread trace dump instructions -e --id 16",
+      self.expect("thread trace dump instructions -e --id 17",
         patterns=[f'''thread #1: tid = .*
   a.out`symbol stub for: foo\(\)
-    16: {ADDRESS_REGEX}    jmpq .*
-    15: \(event\) software disabled tracing
+    17: {ADDRESS_REGEX}    jmpq .*
+    16: \(event\) software disabled tracing
   a.out`main \+ 63 at main.cpp:16
-    14: {ADDRESS_REGEX}    callq  .* ; symbol stub for: foo\(\)
-    13: \(event\) software disabled tracing
+    15: {ADDRESS_REGEX}    callq  .* ; symbol stub for: foo\(\)
+    14: \(event\) software disabled tracing
   a.out`main \+ 60 at main.cpp:14
-    12: {ADDRESS_REGEX}    movl .*
-    11: {ADDRESS_REGEX}    addl .*
-    10: {ADDRESS_REGEX}    movl .*
+    13: {ADDRESS_REGEX}    movl .*
+    12: {ADDRESS_REGEX}    addl .*
+    11: {ADDRESS_REGEX}    movl .*
   a.out`main \+ 52 \[inlined\] inline_function\(\) \+ 18 at main.cpp:6
-    9: {ADDRESS_REGEX}    movl .*
+    10: {ADDRESS_REGEX}    movl .*
   a.out`main \+ 49 \[inlined\] inline_function\(\) \+ 15 at main.cpp:5
-    8: {ADDRESS_REGEX}    movl .*
-    7: {ADDRESS_REGEX}    addl .*
-    6: {ADDRESS_REGEX}    movl .*
+    9: {ADDRESS_REGEX}    movl .*
+    8: {ADDRESS_REGEX}    addl .*
+    7: {ADDRESS_REGEX}    movl .*
   a.out`main \+ 34 \[inlined\] inline_function\(\) at main.cpp:4
-    5: {ADDRESS_REGEX}    movl .*
-    4: \(event\) software disabled tracing
+    6: {ADDRESS_REGEX}    movl .*
+    5: \(event\) software disabled tracing
   a.out`main \+ 31 at main.cpp:12
-    3: {ADDRESS_REGEX}    movl .*
-    2: {ADDRESS_REGEX}    addl .*
-    1: \(event\) software disabled tracing
-    0: {ADDRESS_REGEX}    movl .*'''])
+    4: {ADDRESS_REGEX}    movl .*
+    3: {ADDRESS_REGEX}    addl .*
+    2: \(event\) software disabled tracing
+    1: {ADDRESS_REGEX}    movl .*
+    0: \(event\) trace synchronization point \[offset \= 0x0xec0\]'''])

diff  --git a/lldb/test/API/commands/trace/TestTraceLoad.py b/lldb/test/API/commands/trace/TestTraceLoad.py
index cd2f80bfe32a2..a793dd6a888ed 100644
--- a/lldb/test/API/commands/trace/TestTraceLoad.py
+++ b/lldb/test/API/commands/trace/TestTraceLoad.py
@@ -13,11 +13,11 @@ def testLoadMultiCoreTrace(self):
         trace_description_file_path = os.path.join(src_dir, "intelpt-multi-core-trace", "trace.json")
         self.traceLoad(traceDescriptionFilePath=trace_description_file_path, substrs=["intel-pt"])
         self.expect("thread trace dump instructions 2 -t",
-          substrs=["19531: [20456511.000 ns] (error) expected tracing enabled event",
+          substrs=["19532: [20456513.000 ns] (error) expected tracing enabled event",
                    "m.out`foo() + 65 at multi_thread.cpp:12:21",
-                   "19523: [19691630.216 ns] 0x0000000000400ba7    jg     0x400bb3"])
+                   "9524: [19691630.226 ns] 0x0000000000400ba7    jg     0x400bb3"])
         self.expect("thread trace dump instructions 3 -t",
-          substrs=["67919: [19736130.084 ns] 0x0000000000400bd7    addl   $0x1, -0x4(%rbp)",
+          substrs=["61831: [19736134.073 ns] 0x0000000000400bd7    addl   $0x1, -0x4(%rbp)",
                    "m.out`bar() + 26 at multi_thread.cpp:20:6"])
 
         self.expect("thread trace dump info --json",
@@ -38,11 +38,7 @@ def testLoadMultiCoreTrace(self):
       "individualCounts": {}
     },
     "continuousExecutions": 0,
-    "PSBBlocks": 0,
-    "errorItems": {
-      "total": 0,
-      "individualErrors": {}
-    }
+    "PSBBlocks": 0
   },
   "globalStats": {
     "timingInSeconds": {
@@ -62,7 +58,7 @@ def testLoadMultiCoreTrace(self):
     "tid": 3497496,
     "traceItemsCount": 19533,
     "memoryUsage": {
-      "totalInBytes": "176097",
+      "totalInBytes": "176065",
       "avgPerItemInBytes": 9.''', '''},
     "timingInSeconds": {
       "Decoding instructions": ''', '''
@@ -71,16 +67,13 @@ def testLoadMultiCoreTrace(self):
       "totalCount": 11,
       "individualCounts": {
         "software disabled tracing": 1,
-        "HW clock tick": 9,
+        "trace synchronization point": 1,
+        "HW clock tick": 8,
         "CPU core changed": 1
       }
     },
     "continuousExecutions": 1,
-    "PSBBlocks": 1,
-    "errorItems": {
-      "total": 0,
-      "individualErrors": {}
-    }
+    "PSBBlocks": 1
   },
   "globalStats": {
     "timingInSeconds": {
@@ -111,11 +104,11 @@ def testLoadCompactMultiCoreTrace(self):
         # we'll load the compact trace and make sure it works
         self.traceLoad(os.path.join(compact_trace_bundle_dir, "trace.json"), substrs=["intel-pt"])
         self.expect("thread trace dump instructions 2 -t",
-          substrs=["19531: [20456511.000 ns] (error) expected tracing enabled event",
+          substrs=["19532: [20456513.000 ns] (error) expected tracing enabled event",
                    "m.out`foo() + 65 at multi_thread.cpp:12:21",
-                   "19523: [19691630.216 ns] 0x0000000000400ba7    jg     0x400bb3"])
+                   "19524: [19691630.226 ns] 0x0000000000400ba7    jg     0x400bb3"])
         self.expect("thread trace dump instructions 3 -t",
-          substrs=["67919: [19736130.084 ns] 0x0000000000400bd7    addl   $0x1, -0x4(%rbp)",
+          substrs=["61831: [19736134.073 ns] 0x0000000000400bd7    addl   $0x1, -0x4(%rbp)",
                    "m.out`bar() + 26 at multi_thread.cpp:20:6"])
 
         # This reduced the number of continuous executions to look at
@@ -130,11 +123,11 @@ def testLoadMultiCoreTraceWithStringNumbers(self):
         trace_description_file_path = os.path.join(src_dir, "intelpt-multi-core-trace", "trace_with_string_numbers.json")
         self.traceLoad(traceDescriptionFilePath=trace_description_file_path, substrs=["intel-pt"])
         self.expect("thread trace dump instructions 2 -t",
-          substrs=["19531: [20456511.000 ns] (error) expected tracing enabled event",
+          substrs=["19532: [20456513.000 ns] (error) expected tracing enabled event",
                    "m.out`foo() + 65 at multi_thread.cpp:12:21",
-                   "19523: [19691630.216 ns] 0x0000000000400ba7    jg     0x400bb3"])
+                   "19524: [19691630.226 ns] 0x0000000000400ba7    jg     0x400bb3"])
         self.expect("thread trace dump instructions 3 -t",
-          substrs=["67919: [19736130.084 ns] 0x0000000000400bd7    addl   $0x1, -0x4(%rbp)",
+          substrs=["61831: [19736134.073 ns] 0x0000000000400bd7    addl   $0x1, -0x4(%rbp)",
                    "m.out`bar() + 26 at multi_thread.cpp:20:6"])
 
     @testSBAPIAndCommands
@@ -143,11 +136,11 @@ def testLoadMultiCoreTraceWithMissingThreads(self):
         trace_description_file_path = os.path.join(src_dir, "intelpt-multi-core-trace", "trace_missing_threads.json")
         self.traceLoad(traceDescriptionFilePath=trace_description_file_path, substrs=["intel-pt"])
         self.expect("thread trace dump instructions 3 -t",
-          substrs=["19531: [20456511.000 ns] (error) expected tracing enabled event",
+          substrs=["19532: [20456513.000 ns] (error) expected tracing enabled event",
                    "m.out`foo() + 65 at multi_thread.cpp:12:21",
-                   "19523: [19691630.216 ns] 0x0000000000400ba7    jg     0x400bb3"])
+                   "19524: [19691630.226 ns] 0x0000000000400ba7    jg     0x400bb3"])
         self.expect("thread trace dump instructions 2 -t",
-          substrs=["67919: [19736130.084 ns] 0x0000000000400bd7    addl   $0x1, -0x4(%rbp)",
+          substrs=["61831: [19736134.073 ns] 0x0000000000400bd7    addl   $0x1, -0x4(%rbp)",
                    "m.out`bar() + 26 at multi_thread.cpp:20:6"])
 
     @testSBAPIAndCommands
@@ -179,22 +172,20 @@ def testLoadTrace(self):
 
   Trace technology: intel-pt
 
-  Total number of trace items: 23
+  Total number of trace items: 24
 
   Memory usage:
     Raw trace size: 4 KiB
-    Total approximate memory usage (excluding raw trace): 0.20 KiB
+    Total approximate memory usage (excluding raw trace): 0.21 KiB
     Average memory usage per item (excluding raw trace): 9.00 bytes
 
   Timing for this thread:
     Decoding instructions: ''', '''
 
   Events:
-    Number of individual events: 2
+    Number of individual events: 3
       software disabled tracing: 2
-
-  Errors:
-    Number of TSC decoding errors: 0'''])
+      trace synchronization point: 1'''])
 
     @testSBAPIAndCommands
     def testLoadInvalidTraces(self):
@@ -280,7 +271,7 @@ def testLoadTraceCursor(self):
 
         # 1. Test some expected items of thread 1's trace cursor.
         thread1 = process.threads[1]
-        cursor = trace.CreateNewCursor(error, thread1) 
+        cursor = trace.CreateNewCursor(error, thread1)
         self.assertTrue(cursor)
         self.assertTrue(cursor.HasValue())
         cursor.Seek(0, lldb.eTraceCursorSeekTypeBeginning)
@@ -296,15 +287,15 @@ def testLoadTraceCursor(self):
         self.assertEqual(cursor.GetEventTypeAsString(), "CPU core changed")
         self.assertEqual(cursor.GetCPU(), 51)
 
-        cursor.GoToId(19531)
+        cursor.GoToId(19532)
 
         self.assertTrue(cursor.IsError())
         self.assertEqual(cursor.GetError(), "expected tracing enabled event")
 
-        cursor.GoToId(19523)
+        cursor.GoToId(19524)
 
         self.assertTrue(cursor.IsInstruction())
-        self.assertEqual(cursor.GetLoadAddress(), 4197287)
+        self.assertEqual(cursor.GetLoadAddress(), 0x400BA7)
 
 
 
@@ -325,13 +316,13 @@ def assertCurrentTraceCursorItemEqual(lhs, rhs):
                 self.fail("Unknown trace item kind")
 
         for thread in process.threads:
-            sequentialTraversalCursor = trace.CreateNewCursor(error, thread) 
+            sequentialTraversalCursor = trace.CreateNewCursor(error, thread)
             self.assertSBError(error)
             # Skip threads with no trace items
             if not sequentialTraversalCursor.HasValue():
                 continue
 
-            # 2. Test "End" boundary of the trace by advancing past the trace's last item. 
+            # 2. Test "End" boundary of the trace by advancing past the trace's last item.
             sequentialTraversalCursor.Seek(0, lldb.eTraceCursorSeekTypeEnd)
             self.assertTrue(sequentialTraversalCursor.HasValue())
             sequentialTraversalCursor.SetForwards(True)
@@ -342,9 +333,9 @@ def assertCurrentTraceCursorItemEqual(lhs, rhs):
 
             # 3. Test sequential traversal using sequential access API (ie Next())
             # and random access API (ie GoToId()) simultaneously.
-            randomAccessCursor = trace.CreateNewCursor(error, thread) 
+            randomAccessCursor = trace.CreateNewCursor(error, thread)
             self.assertSBError(error)
-            # Reset the sequential cursor 
+            # Reset the sequential cursor
             sequentialTraversalCursor.Seek(0, lldb.eTraceCursorSeekTypeBeginning)
             sequentialTraversalCursor.SetForwards(True)
             self.assertTrue(sequentialTraversalCursor.IsForwards())
@@ -361,7 +352,7 @@ def assertCurrentTraceCursorItemEqual(lhs, rhs):
             # sequential access API (ie consecutive calls to Next()).
             TEST_SEEK_ID = 3
             randomAccessCursor.GoToId(TEST_SEEK_ID )
-            # Reset the sequential cursor 
+            # Reset the sequential cursor
             sequentialTraversalCursor.Seek(0, lldb.eTraceCursorSeekTypeBeginning)
             sequentialTraversalCursor.SetForwards(True)
             for _ in range(TEST_SEEK_ID): sequentialTraversalCursor.Next()

diff  --git a/lldb/test/API/commands/trace/TestTraceStartStop.py b/lldb/test/API/commands/trace/TestTraceStartStop.py
index afc7ee6ec91ba..0f81816c58714 100644
--- a/lldb/test/API/commands/trace/TestTraceStartStop.py
+++ b/lldb/test/API/commands/trace/TestTraceStartStop.py
@@ -168,29 +168,29 @@ def testStartStopLiveThreads(self):
         self.expect("thread trace dump instructions -f",
             patterns=[f'''thread #1: tid = .*
   a.out`main \+ 4 at main.cpp:2
-    0: {ADDRESS_REGEX}    movl'''])
+    1: {ADDRESS_REGEX}    movl'''])
 
         # We can reconstruct the instructions up to the second line
         self.expect("n")
         self.expect("thread trace dump instructions -f",
             patterns=[f'''thread #1: tid = .*
   a.out`main \+ 4 at main.cpp:2
-    0: {ADDRESS_REGEX}    movl .*
+    1: {ADDRESS_REGEX}    movl .*
   a.out`main \+ 11 at main.cpp:4
-    2: {ADDRESS_REGEX}    movl .*
-    4: {ADDRESS_REGEX}    jmp  .* ; <\+28> at main.cpp:4
-    6: {ADDRESS_REGEX}    cmpl .*
-    8: {ADDRESS_REGEX}    jle  .* ; <\+20> at main.cpp:5'''])
+    3: {ADDRESS_REGEX}    movl .*
+    5: {ADDRESS_REGEX}    jmp  .* ; <\+28> at main.cpp:4
+    7: {ADDRESS_REGEX}    cmpl .*
+    9: {ADDRESS_REGEX}    jle  .* ; <\+20> at main.cpp:5'''])
 
         self.expect("thread trace dump instructions",
             patterns=[f'''thread #1: tid = .*
   a.out`main \+ 32 at main.cpp:4
-    8: {ADDRESS_REGEX}    jle  .* ; <\+20> at main.cpp:5
-    6: {ADDRESS_REGEX}    cmpl .*
-    4: {ADDRESS_REGEX}    jmp  .* ; <\+28> at main.cpp:4
-    2: {ADDRESS_REGEX}    movl .*
+    9: {ADDRESS_REGEX}    jle  .* ; <\+20> at main.cpp:5
+    7: {ADDRESS_REGEX}    cmpl .*
+    5: {ADDRESS_REGEX}    jmp  .* ; <\+28> at main.cpp:4
+    3: {ADDRESS_REGEX}    movl .*
   a.out`main \+ 4 at main.cpp:2
-    0: {ADDRESS_REGEX}    movl .* '''])
+    1: {ADDRESS_REGEX}    movl .* '''])
 
         # We stop tracing
         self.expect("thread trace stop")
@@ -206,12 +206,12 @@ def testStartStopLiveThreads(self):
         self.expect("thread trace dump instructions -f",
             patterns=[f'''thread #1: tid = .*
   a.out`main \+ 20 at main.cpp:5
-    0: {ADDRESS_REGEX}    xorl'''])
+    1: {ADDRESS_REGEX}    xorl'''])
 
         self.expect("thread trace dump instructions",
             patterns=[f'''thread #1: tid = .*
   a.out`main \+ 20 at main.cpp:5
-    0: {ADDRESS_REGEX}    xorl'''])
+    1: {ADDRESS_REGEX}    xorl'''])
 
         self.expect("c")
         # Now the process has finished, so the commands should fail


        


More information about the lldb-commits mailing list