[Lldb-commits] [PATCH] D103588: [trace] Create a top-level instruction class

Tue Jun 8 14:15:48 PDT 2021

wallace requested review of this revision.
wallace added a comment.

> At Apple, we use lldb to navigate instruction traces that contain billions of instructions. Allocating 16 bytes per instruction simply will not scale for our workflows. We require the in-memory overhead to be approximately 1-2 bits per instruction. I'm not familiar with how large IntelPT traces can get, but presumably (for long enough traces) you will encounter the same scaling problem.

I have thought about that and I'm taking it into account. We implemented a TraverseInstructions API in Trace.h that receives a callback (see https://lldb.llvm.org/cpp_reference/classlldb__private_1_1Trace.html#a02e346b117c15cef1cb0568c343f7e1c). The idea is that the intel pt plugin can decode the instructions on the fly as part of the iteration. 
In terms of memory, there are two kinds of trace objects: one is the raw undecoded instruction trace that uses 1 bit per instruction in avg. That one is decoded in https://github.com/llvm/llvm-project/blob/main/lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp#L163 into TraceInstructions, which effectively use 16 bytes in the duration of their lives. Right now we haven't implemented lazy decoding; we are decoding the entire trace. But that's just because so far we have been working with small traces. As we progress in this work, we'll start working with larger traces and we'll have to do implement the lazy decoding, for which the TraverseInstructions API will come handy.

> What alternatives to the vector<TraceInstruction> representation have you considered? One idea might be to implement your program analyses on top of a generic interface for navigating forward/backward through a trace and extracting info about the instruction via a set of API calls; this leaves the actual in-memory representation of "TraceInstruction" unspecified.

That's exactly what I described as the TraverseInstructions method :) The intel-pt plugin or any other trace plugin could have an internal representation for traces, but we still need a generic TraceInstruction object for consumers that want to be agnostic of the trace technology.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103588/new/

https://reviews.llvm.org/D103588