[Lldb-commits] [PATCH] D89283: [trace][intel-pt] Implement the basic decoding functionality

Tue Oct 20 07:23:05 PDT 2020

labath added a comment.

In D89283#2340586 <https://reviews.llvm.org/D89283#2340586>, @wallace wrote:

> What's left to discuss:
>
> - I tried to use obj2yaml, but it creates files much larger than the binaries, so in terms of space, I'd rather keep the binaries. Besides, I removed ld.so, and the binaries that are left are tiny. I imagine that the binaries that will be committed in the future will also be tiny ones depicting specific edge cases.

Well, yaml is text so it's not surprising that it'd be larger (though sometimes it can actually be smaller, because it e.g. omits padding, or because you can reduce by deleting irrelevant stuff; and git can store text diffs efficiently). But there's the other aspect that Vedant mentioned -- their opaqueness/reviewability. With a yaml file, one can see (directly in the review window or in his text editor) what kind input is the program being fed and correlate that with the expected output. This is not perfect because a lot of the details (e.g. the disassembly, and most importantly the trace file) is still obscured, but it's better than nothing. So if it works, I'd still go for the yaml option (and I have to send a big thank you to whoever implemented the program header support).

================
Comment at: lldb/source/Plugins/Trace/intel-pt/DecodedThread.cpp:11
+
+#include <sstream>
+
----------------
I guess this is not needed anymore (?)

================
Comment at: lldb/source/Plugins/Trace/intel-pt/DecodedThread.h:66-68
+  pt_insn m_pt_insn;
+  int m_libipt_error_code;
+  std::string m_custom_error;
----------------
Maybe better left for a separate patch, but if we're going to have loads of these objects, then the we'd better optimize its size.

================
Comment at: lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp:191
+  pt_image *image = pt_insn_get_image(decoder);
+  assert(pt_image_set_callback(image, ReadProcessMemory, &process) == 0);
+
----------------
Can't have side-effects inside assertions. 

================
Comment at: lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.cpp:32
+///   returned.
+int FindNextSynchronizationPoint(pt_insn_decoder &decoder) {
+  // Try to sync the decoder. If it fails, then get
----------------
clayborg wrote:
> labath wrote:
> > static
> Make static or add an anonymous namespace around all of these functions so you don't have to mark them all as static.
Actually, the [[ http://llvm.org/docs/CodingStandards.html#anonymous-namespaces | coding standards ]] say anonymous namespaces should not be used for functions.

================
Comment at: lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.h:21-44
+class IntelPTDecoder {
+public:
+  /// \param[in] process
+  ///     The process associated with the traces to decode.
+  ///
+  /// \param[in] pt_cpu
+  ///     The libipt \a pt_cpu information of the CPU where the traces were
----------------
I don't see the value of this class -- it's only used once, and in a very ephemeral way. `ThreadTraceDecoder::Decode` could just call `CreateDecoderAndDecode` directly (and it'd be shorter for it).

================
Comment at: lldb/source/Plugins/Trace/intel-pt/IntelPTDecoder.h:66
+private:
+  ThreadTraceDecoder(const ThreadTraceDecoder &other) = delete;
+
----------------
please delete the copy assignment as well

================
Comment at: lldb/source/Plugins/Trace/intel-pt/TraceIntelPT.cpp:109
+  else {
+    consumeError(decoded_thread.takeError());
+    return 0;
----------------
wallace wrote:
> labath wrote:
> > I'm having doubts about this "I have an thread but was not able to decode _anything_ about it" state is worth it. Having many different ways to report errors just increases the chance of something going wrong, and in `TraverseInstructions` you're already treating this state as a pseudo-instruction.
> > 
> > Maybe that representation should be used all the way down? Or (and this may be even better) we avoid creating such Threads in the first place (print an error/warning when the target is created).
> > Maybe that representation should be used all the way down? 
> 
> I'll follow that path. This will create consistency through the code
> 
> 
> > Or (and this may be even better) we avoid creating such Threads in the first place (print an error/warning when the target is created).
>  
> I wish I could do that, but decoding is very expensive and should be done lazily. According to Intel and my tests, if a thread was traced during T seconds, then decoding takes around 10T, which is a big amount of time if you were tracing 10 threads for 5 seconds, which would take 500 seconds to decode. At least for know we are not doing parallel decoding. I imagine at some point we'll have to work on that.
If you can pull that off -- great. However, I have doubts (and judging by the other comments, you're starting to have some too) regarding how long will you be able to postpone the decoding. For example, lldb very much likes to know the PC value of each thread (can you really blame it?) -- so much that we've added a special field to the gdb-remote stop-reply packet which lists the pc values of all threads.

That leads me to believe that you'll need to decode at least the last block of each thread's trace very early on

================
Comment at: lldb/test/API/commands/trace/TestTraceDumpInstructions.py:48
+  [ 4] 0x40052d
+  [ 5] 0x400529
+  [ 6] 0x400525
----------------
wallace wrote:
> clayborg wrote:
> > If we reverse the direction, then hitting "enter" after doing one command won't flow as nicely as it does now. That being said I agree with Pavel that we should figure out what is expected. I generally think that earlier text is older.
> > 
> > I would not switch the indexes so that they change with any options that are specified. We currently have --start-position, but maybe this should be just --position? Or we specify:
> > ```
> > --from-end <offset> 
> > ```
> > <offset> would be the index offset from the end (newest) of the data?
> > ```
> > --from-start <offset>
> > ```
> > <offset> would be the index offset from the start (oldest) of the data?
> > 
> > I would be fine with:
> > ```
> > [--forwards | -f] [--backwards | -b]
> > ```
> > 
> > but I think it would make sense to show the indexes in a consistent way regardless of what options are displayed. Maybe it makes sense to always show the true index where zero is the oldest and N is the newest?
> > 
> > We do need to make sure the auto repeat command looks good though which will be hard with oldest to newest ordering.
> What about this:
> 
> We expose the indices in a chronologically increasing way, where [0] is the oldest instruction and [N] is the newest. Then we have the two options suggested by Greg
> 
>   --from-end <offset>
> 
> Where offset is an index or the string "end", meaning the last instruction of the trace, in case the user doesn't know the index of it. Then the instructions are printed
> 
>     [offset]
>     [offset - 1]
>     ...
>     [offset - K]
> 
> And if there's a repeat command, this would be printed
>     
>     [offset - K - 1]
>     [offset - K - 2]
>     ...
> 
> Which would look nicely as a contiguous list of instructions if concatenated.
> 
> The other option would be
> 
>   --from-start<offset>
> 
> Where offset is an index.  Then the instructions are printed
> 
>     [offset]
>     [offset + 1]
>     [offset + 2]
>     ...
>     [offset + K]
>     
> And after a repeat command, you'd get
> 
>     [offset + K + 1]
>     [offset + K + 2]
>     ...
> 
> 
> I think this would serve all purposes.
Ah... this is tricky...

The concatenation aspect is nice, but I'm not sure it trumps the "earlier/higher text is older" intuition. Even if I'm analyzing backwards, I think I'd prefer seeing a discontinuous set of lists which go the "right way" instead of a single continuous list which goes "backwards". I.e. I think I'd find this:
```
6
7
8
9
10
(lldb)
1
2
3
4
5
```
easier to read than this:
```
10
9
8
7
6
(lldb)
5
4
3
2
1
```

However, I don't see myself using this anytime soon, so if you think the latter is the best way to represent this, then fine. The thing we choose here is not set in stone anyway, and we can re-examine this later...

================
Comment at: lldb/test/API/commands/trace/TestTraceDumpInstructions.py:159-160
+            substrs=['''thread #1: tid = 3842849, total instructions = 2
+  [0] no memory mapped at this address: 0x400518
+  [1] no memory mapped at this address: 0x400511'''])
+
----------------
wallace wrote:
> clayborg wrote:
> > These lines should start with the address like all other lines. Then the question is what the output should look like. Do we really need to tell the user that there is no memory mapped here? Can we just print "<???>" or nothing if we have no information like:
> > 
> > ```
> > [0] 0x400518: <???>
> > [1] 0x400511: <???>
> > ```
> > 
> I think it's highly important to tell the user that this is a very important error and not make it apparently inoffensive with the formatting. Let me elaborate why this is not an inoffensive error.
> 
> First of all, the encoded trace is composed of packets, composed of two main packets:
> 
> - PSB: synchronization packet that contains the current PC. These packets are sporadic (often one for each 4KB of data), as they are big in size.
> - TNT: taken/not taken packet that contains one bit per branch executed by the processor. These packets are probably the most frequent and they don't include any PC.
> 
> When decoding, the decoder finds first a PSB packet, gaining the knowledge of the current PC, then it starts traversing the binary instruction by instruction until it finds a branch, in which case it finds the next TNT packet and learns if that branch was taken or not, then continuing the traversal in the correct direction. 
> 
> This means that when the decoder can't read a memory address, then it won't be able to decode any TNT packets until the next PSB synchronization point. In fact, in this diff, when there's an instruction decoding error, we skip to the next PSB and resume decoding from there. This problem implies that we are skipping potentially thousands of instructions. In other words, if you see
> 
> [0]: 0x400518
> [1]: 0x400511
> [2]: no memory mapped at this address: 0x400502
> [3]: 0x400500
> 
> Then that means that between instructions [3] and [1] there were an unknown number of instructions that couldn't be decoded, the first one of them being at 0x400502. We won't be able to do anything useful with those instructions, and the user would need to provide the missing module and redecode to reconstruct the full trace.
This does beg the question of whether we shouldn't make the distinction even more obvious by breaking the sequence numbers in some way. I don't really have an answer to that question, though...

Some two level namespacing?
```
sequence 1:
[0]: 0x47
[1]: 0x48
error: no memory mapped at 0x42

sequence 2:
[0]: 0x147
...

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89283/new/

https://reviews.llvm.org/D89283