[Lldb-commits] [PATCH] D103588: [trace] Create a top-level instruction class

Vedant Kumar via Phabricator via lldb-commits lldb-commits at lists.llvm.org
Wed Jun 9 15:34:48 PDT 2021


vsk added a comment.

In D103588#2806894 <https://reviews.llvm.org/D103588#2806894>, @wallace wrote:

> I've been thinking about what you said and I'm having second thoughts on my implementation. I'll share more context:
>
> - I want to work in the short term on reverse debugging and reconstruction of stack traces, for that i'll need to know the instruction type of each instruction in the trace, which will be used as part of some heuristics to identify calls and returns between functions.
> - A future application that we plan to work on is adding profiling information to the instructions
> - Right now the intel-pt plugin is storing the decoded instructions in a vector, which works for small traces but wont' for gigantic traces. I imagine that I'll end up removing that vector and make the TraverseInstruction API decode instructions in place keeping one instruction in memory at a time within the intel pt plugin for a given traversal. For that I'll need accessors that can provide information of the current Instruction. As there could be two or more concurrent traversals happening at the same time (I'm trying to be generic here), it might make sense to create an abstract class TraceInstruction that can be extended by each plug-in and implement its getters.
>
> I'm thinking about something like this
>
>   class TraceInstruction {
>     virtual lldb::addr_t GetLoadAddress() = 0;
>     virtual TraceInstructionType() GetInstructionType() = 0;
>     virtual uint64_t GetTimestamp() = 0;
>     ... anything that can appear in the future 
>   };
>
> and have no members, leaving to each plug-in the decision of which of those methods to implement and how.
>
> What do you think of this? I think this incorporates your feedback.

Thanks for the context.

I'm not convinced that retaining only 1 decoded instruction in memory at a time (under a TraverseInstructions callback) will be sufficient for the use cases you've outlined. Let's say the user sets a few breakpoints, then repeatedly does "rev-continue" followed by "bt". If lldb only holds a max of 1 decoded instruction in memory, it would need to repeatedly decode parts of the trace to evaluate those user commands, which would be slow as you pointed out earlier. OTOH there's not enough memory to retain all of the decoded instructions.

It seems like we need a way for generic trace analyses to request _batches_ of decoded trace data at a time, lazily demanding more from the trace plugin only if the analysis requires it to proceed, while freeing any stale decoded data it's not using.

Moreover, there doesn't seem to be a hard requirement that each trace plugin specialize a TraceInstruction class. For example, to support that breakpoint/rev-continue/bt workflow, you really only need a list of the places where _control flow changed_. So the generic layer could request that data from a trace plugin ('plugin->decodeControlFlowChanges(startPos, stopPos)'), then query more specific APIs that answer questions about how many control flow changes there were, and what the inst type + load address was at each control flow change point. Each plugin implementation can pick the most efficient way to implement those narrower APIs, instead of trying to cram all potentially useful information into a single TraceInst class.

The reason why I believe this to be such a crucial part of the design is because of the scaling problem I mentioned earlier. A single 1.5GHz core running at IPC=2 yields 3*10^9 instructions/second, so we hit "gigantic" all too quickly!



================
Comment at: lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp:117
+  case ptic_far_call:
+    return lldb::eTraceInstructionFarCall;
+  case ptic_far_return:
----------------
Are all of these instruction types both necessary and generic? E.g. does having FarCall in addition to Call assist with backtrace reconstruction, and if so, what does FarCall mean precisely?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103588/new/

https://reviews.llvm.org/D103588



More information about the lldb-commits mailing list