[Lldb-commits] [PATCH] D91679: [trace][intel-pt] Implement trace start and trace stop

Tue Jan 26 18:33:33 PST 2021

wallace added inline comments.

================
Comment at: lldb/docs/lldb-gdb-remote.txt:278
+//     "tids": [<thread ids in decimal>],
+//     "variant": "specificThreads" | "currentAndFutureThreads" | "process",
+//     "params": {
----------------
clayborg wrote:
> Seems like we don't need the "variant" here. If we have "tids" then we are enabling tracing for only that list of threads, and if it is missing, then we enable it for all threads.
> 
> See my questions in the non-inlined comment section about the "process" variant.
After thinking a lot about it, I think that providing whole-process tracing instead of per-thread tracing is going to be impractical for lldb, which means that only two variants are necessary and your suggestion is correct. 

I'll elaborate more: whole-process tracing means using a single buffer per logical core, having each of these buffers contain the instructions of all the threads that run on its core. As Intel PT doesn't trace thread switches, it's necessary to add a secondary context switch trace to be able to produce per-thread traces, which is a requirement for analyzing the instruction of a thread. This splitting is done by correlating the timestamps of the context switches with the timestamps of the packets in the Intel PT trace. There are two problems here:

- Tracing context switches requires the kernel to have certain capabilities that are not ubiquitous on all systems, and some of these are more accurate than others. One option is to use PERF_RECORD_SWITCH, but it's not available everywhere.
- Even if you were able to trace context switches, now you depend on the granularity of the timestamps in the intel pt packets. Old Intel hardware provides a timestamp for every few KBs of packets, and newer hardware allows you to have a timestamp for each cpu cycle. In any case, you can end up with invalid splits, as you need to interpolate the timestamps for the packets whose timestamp is not available, and you could assign a packet to the wrong thread.

I was chatting about that with some infra folks at fb, and they think that they will only be able to support this kind of tracing by implementing a special collector that has access to a special feature in the kernel they are working on, and would only work on the most recent Intel hardware with high timestamp accuracy.

That being said, it seems that it's not worth working on that for LLDB. Each company that wants that level of tracing should work on their own way to do accurate thread splitting, and it would be misleading to implement something in LLDB that can produce wrong values. For LLDB we can use per-thread tracing, which will ensure 100% accuracy.

================
Comment at: lldb/docs/lldb-gdb-remote.txt:310
+//  {
+//    "bufferSizeInKB": <trace buffer size in KB in decimal>
+//    "perfConfig": <custom low-level perf event intel-pt config in decimal>
----------------
clayborg wrote:
> Is this the trace buffer size that will be used for each thread? For all threads? Is this size divided up between the threads? What happens when new threads get created? Can we run out of memory in the kernel if a process creates too many threads if this buffer is for each thread? Need clarification on what this number means.
This is KB per thread. See my comment below about adding a limit on the total buffer memory used.

================
Comment at: lldb/docs/lldb-gdb-remote.txt:403
+//    "type": <tracing technology name, e.g. intel-pt, arm-coresight>
+//    "label": <identifier for the data>
+//    "tid": <tid in decimal if the data is related to a thread, 0 otherwise>
----------------
clayborg wrote:
> Are these custom for each plug-in? Any details on what intel-pt can specify here? You supplied intel PT specific info for other settings above in the docs.
Yes, it's custom for each plug-in. I added entry for intel pt in line 411.

================
Comment at: lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp:1680
+            thread_id,
+            m_intel_pt_trace_new_threads_params->params.buffer_size_in_kb,
+            m_intel_pt_trace_new_threads_params->params.perf_config)) {
----------------
clayborg wrote:
> Having a fixed size of memory for each thread can be dangerous. What if the user specifies a really large value, and then creates thousands of threads? What will happen to the system? Will if fail gracefully? Can you deadlock your computer?
AFAIK, the buffer is stored in a region in memory that is fixed and shouldn't be swapped, as the CPU needs to write directly there without interrupting the traced process. So I imagine that if you have too many threads, then you might end up consuming all your RAM and killing your computer.

In any case, your point is valid. Given that we don't know beforehand how many things will end up being created, what about creating a setting for the maximum total amount of memory to use? E.g. 500MB. The user could override it if needed. At least with this, the first threads, which are commonly the long-lived threads, will be traced, and future short-lived threads won't be traced, but they don't matter that much. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91679/new/

https://reviews.llvm.org/D91679