[Lldb-commits] [PATCH] D120595: [WIP][trace][intelpt] Add TSC to Nanosecond conversion for IntelPT traces

Wed Mar 9 21:39:40 PST 2022

wallace added a comment.

The main change is to make the counter conversion generic, so that we don't need to add more APIs in the future. Besides that, you don't need to make any drastic change. Good job so far!

================
Comment at: lldb/docs/lldb-gdb-remote.txt:455
+//    }],
+//    "tsc_conversion"?: Optional<{
+//      "kind": <string>,
----------------
wallace wrote:
> wallace wrote:
> > The ? After the key name makes Optional<> redundant. Just keep the ?
> 
let's make this more generic for any kind of counter we might want to include in a trace, including ones that could come from PT_WRITE, so let's call it `counter_conversion`

================
Comment at: lldb/docs/lldb-gdb-remote.txt:455-463
+//    "tsc_conversion"?: Optional<{
+//      "kind": <string>,
+//           Timestamp counter conversion rate kind, e.g. perf.
+//           The kind strings can be found in the overriden toJSON method
+//           of each TimestampCounterRate subclass. The kind determines
+//           what fields are available in the remainder of the response.
+//
----------------
wallace wrote:
> The ? After the key name makes Optional<> redundant. Just keep the ?

================
Comment at: lldb/docs/lldb-gdb-remote.txt:476-502
+//  Additional params in the tsc_conversion output schema when kind == "perf":
+//   {
+//     "time_mult": <decimal integer>,
+//         time_mult field of the Linux perf_event_mmap_page struct available when
+//         cap_usr_time is set. Used in TSC to nanosecond conversion calculation defined
+//         by the Linux perf_event API.
+//
----------------
someone who wants to know more about these fields can simply open the perf_event documentation, so let's just point to it to reduce the amount of text here

================
Comment at: lldb/include/lldb/Target/Trace.h:312
+  virtual llvm::Optional<TraceGetStateResponse>
+  DoRefreshLiveProcessState(llvm::Expected<std::string> json_string) = 0;

----------------
`json_string` is not a good name. Let's call it `json_response`

================
Comment at: lldb/include/lldb/Target/TraceCursor.h:183-190
   /// Get the timestamp counter associated with the current instruction.
   /// Modern Intel, ARM and AMD processors support this counter. However, a
   /// trace plugin might decide to use a different time unit instead of an
   /// actual TSC.
   ///
   /// \return
   ///     The timestamp or \b llvm::None if not available.
----------------
refactor this into a new function generic for all possible counters

you will need to create a new enum in `lldb-enumerations.h`

enum TraceCounter {
  // Timestamp counter (TSC), like the one offered by Intel CPUs.
  eTraceCounterTSC,
};

================
Comment at: lldb/include/lldb/Target/TraceCursor.h:193
+  // TODO: add docs and rename once naming convention is agreed upon
+  virtual llvm::Optional<uint64_t> GetNanos() = 0;
+
----------------
jj10306 wrote:
> wallace wrote:
> > What about just naming it GetTimestamp and return std::chrono::nanoseconds? That way we use c++ time conversion libraries and we don't include the time granularity in the name
> sgtm - do you think the `GetTimestampCounter` API's name should remain the same or change it to something like `GetTsc`? This file is trace agnostic so tying it to TSC could be confusing since, afaiu, TSC is intel specific, but I noticed the docs of that method mention "TSC", so curious to hear your thoughts.
just replied above about having a generic method for all kinds of counter. Regarding this one, let's refactor it into

  // Get the timestamp associated with the current instruction.
  //
  // Each trace plug-in might use a different 0 timestamp, e.g. time since last clock reset, time since the program started, etc.
  // Also, it might happen that only some instructions of the entire trace have a timestamp associated with them.
  //
  //   \return
  //      The timestamp associated with the current instruction, or llvm::None if not available.
  virtual llvm::Optional<std::chrono::nanoseconds> GetTimestamp() = 0;

================
Comment at: lldb/include/lldb/Utility/TraceIntelPTGDBRemotePackets.h:46
+// TODO: update after naming convention is agreed upon
+class TimestampCounterRate {
+  public:
----------------
jj10306 wrote:
> wallace wrote:
> > This class is generic, so better not enforce the meaning of the zero tsc. Each trace plugin might be opinionated regarding this timestamp 
> Good point. Given that this class is generic, would it make more sense to move it from this trace specific file `TraceIntelPTGDBRemotePackets.h` to `TraceGDBRemotePackets.h` since that file is trace agnostic?
I imagine that with generic counter conversions we can't have a simple class that works for everything, but we might be able to do this:

  template<typename ToType>
  class TraceCounterConversion {
    virtual ~TraceCounterConversion() = default;
    virtual ToType ToNanos(uint64_t raw_counter) = 0;
    virtual llvm::json::Value toJSON() = 0;
  }

also move it to TraceGDBRemotePackets as you mentioned.

Then the using statement from below could be specific for TSCs

  using TraceTSCCounterRateSP = std::shared_ptr<TraceCounterConversion<uint64_t>> ;

That should work, as the intel pt plugin might explicit want a conversion type to uint64_t for its time conversions.  Other hypothetical counters might be converted to double or int64_t, for example.

================
Comment at: lldb/include/lldb/Utility/TraceIntelPTGDBRemotePackets.h:60
+/// `time_zero` in https://man7.org/linux/man-pages/man2/perf_event_open.2.html for more information.
+class PerfTimestampCounterRate : public TimestampCounterRate {
+  public:
----------------
  class LinuxPerfTscConversion: public TraceCounterConversion<uint64_t> {

================
Comment at: lldb/source/Plugins/Process/Linux/IntelPTManager.cpp:1
 //===-- IntelPTManager.cpp ------------------------------------------------===//
 //
----------------
rename this file to IntelPTCollector

================
Comment at: lldb/source/Plugins/Process/Linux/IntelPTManager.cpp:699
+
+llvm::Expected<TimestampCounterRateSP> PerfGetTscRate(lldb::pid_t pid) {
+    Log *log = GetLog(POSIXLog::Ptrace);
----------------
rename this to FetchTscRate and move it to a new file lldb/source/Plugins/Process/Linux/Perf.h
you might want to move all the resource_handle functions to that file as well

================
Comment at: lldb/source/Plugins/Process/Linux/IntelPTManager.cpp:700
+llvm::Expected<TimestampCounterRateSP> PerfGetTscRate(lldb::pid_t pid) {
+    Log *log = GetLog(POSIXLog::Ptrace);
+    perf_event_attr attr;
----------------
This is the incorrect logging type. Create a new logging type `POSIXLog::Trace`. Ptrace is the name of the low-level linux tool used for debugging programs. Also change the logging type of the TraceStart method to match this new one

================
Comment at: lldb/source/Plugins/Process/Linux/IntelPTManager.cpp:714
+      LLDB_LOG(log, "syscall error {0}", errno);
+      return createStringError(inconvertibleErrorCode(), "perf event syscall failed");
+    }
----------------
append the message of the errno with `std::strerror(errno)`

now that i think of it, you don't need to log anything here. Just return descriptive error messages and the caller should decide what to do with the errors. 

================
Comment at: lldb/source/Plugins/Process/Linux/IntelPTManager.cpp:724-725
+    if (base == MAP_FAILED) {
+      // TODO: should logging be done in this function or should the error just be returned here and let
+      // the caller log/consume the error?
+      LLDB_LOG(log, "mmap base error {0}", errno);
----------------
return the error. The caller should decide whether to fail or to ignore the error

================
Comment at: lldb/source/Plugins/Process/Linux/IntelPTManager.cpp:727
+      LLDB_LOG(log, "mmap base error {0}", errno);
+      return createStringError(inconvertibleErrorCode(), "perf mmap meta allocation failed");
+    }
----------------
include the strerror

================
Comment at: lldb/source/Plugins/Process/Linux/IntelPTManager.cpp:738-739
+      auto err_cap = !perf_mmap_meta_handle->cap_user_time ? "cap_user_time" : "cap_user_time_zero";
+      LLDB_LOG(log, "{0} not supported. TSC cannot be converted to time unit", err_cap);
+      return createStringError(inconvertibleErrorCode(), "TSC cannot be converted to time unit due to unsupported capabilities");
+    }
----------------
don't log, just return the error

================
Comment at: lldb/source/Plugins/Process/Linux/IntelPTManager.cpp:747-754
+  if (!g_tsc_rate) {
+    g_tsc_rate = nullptr;
+    // Try to get a TSC rate from each different source until one succeeds
+    // Add additional sources of TSC rate, as necessary
+    if (llvm::Expected<TimestampCounterRateSP> perf_tsc_rate = PerfGetTscRate(pid))
+      g_tsc_rate = *perf_tsc_rate;
+  }
----------------
we don't have caching, so the code should be like

================
Comment at: lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp:315
     : NativeProcessELF(pid, terminal_fd, delegate), m_arch(arch),
-      m_main_loop(mainloop), m_intel_pt_manager(pid) {
+      m_main_loop(mainloop), m_intel_pt_collector(pid) {
   if (m_terminal_fd != -1) {
----------------
nice

================
Comment at: lldb/source/Plugins/Trace/intel-pt/TraceCursorIntelPT.cpp:94-99
+Optional<uint64_t> TraceCursorIntelPT::GetNanos() {
+  if (Optional<uint64_t> tsc = GetTimestampCounter()) {
+    if (TimestampCounterRateSP conversion_params_sp = m_decoded_thread_sp->GetTscConversionParams())
+      return conversion_params_sp->ToNanos(*tsc);
+  }
+  return llvm::None;
----------------
instead of having the decoded thread hand the tsc conversion params all the way up to here, you can create a new method `llvm::Optional<std::chrono::nanoseconds> GetTimestamp(uint64_t insn_index)`

let's try to have this method as simple as possible and put the implementation details at a lower level when possible

================
Comment at: lldb/source/Plugins/Trace/intel-pt/TraceIntelPTSessionFileParser.cpp:34-39
+    "tsc_conversion_params"?: {
+        "kind": "perf",
+        "time_mult": integer,
+        "time_shift": integer,
+        "time_zero": integer
     }
----------------

================
Comment at: lldb/source/Target/TraceInstructionDumper.cpp:188-191
+      // TODO: temporary hacky way to see new timestamp functionality
+      // How do we want to incorporate the nanosecon values into the dump output?
+      if (Optional<uint64_t> timestamp_nanos = m_cursor_up->GetNanos())
+        s.Printf(" NANOS 0x%016" PRIx64, *timestamp_nanos);
----------------
just like the m_show_tsc boolean there, you should add a m_show_realtime boolean. We can print the times in milliseconds, just like
  [123.23ms]

printing in nanoseconds might be too much granularity for being readable

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120595/new/

https://reviews.llvm.org/D120595