[Lldb-commits] [PATCH] D128477: [trace] Add a flag to the decoder to output the instruction type

Tue Jul 5 00:15:40 PDT 2022

wallace requested changes to this revision.
wallace added a comment.
This revision now requires changes to proceed.

much better! Thanks for doing this.

There are two main things to do.

1. The algorithm you are using to classify the instructions uses too many acronyms and it's very well documented. Try to make it understandable enough so that anyone with little familiarity can have an idea of what is happening
2. write a unit test. For this, you can follow any of the tests in `lldb/unittests/Disassembler/`. In order to run it, you'll need to do `ninja DisassemblerTests` and then run the output path printed by that command. It uses gtest. You can see the tests in that folder for inspiration.

================
Comment at: lldb/include/lldb/Core/Disassembler.h:82

+  /// Return InstructionControlFlowType of this instruction.
+  lldb::InstructionControlFlowType
----------------
better this

================
Comment at: lldb/include/lldb/Core/Disassembler.h:84
+  lldb::InstructionControlFlowType
+  GetControlFlowInstructionKind(const ExecutionContext *exe_ctx);
+
----------------
Rename this to
  GetControlFlowKind
because we are already inside the Instruction 

================
Comment at: lldb/include/lldb/lldb-enumerations.h:976
+  eInstructionControlFlowTypeUnknown = 0,
+  /// The instruction is something not listed below.
+  eInstructionControlFlowTypeOther,
----------------

================
Comment at: lldb/source/API/SBInstruction.cpp:244
     FormatEntity::Parse("${addr}: ", format);
-    inst_sp->Dump(&s.ref(), 0, true, false, nullptr, &sc, nullptr, &format, 0);
+    inst_sp->Dump(&s.ref(), 0, true, false, /*show_control_flow_type*/ false,
+                  nullptr, &sc, nullptr, &format, 0);
----------------
the llvm style requires a =

================
Comment at: lldb/source/API/SBInstruction.cpp:279
     FormatEntity::Parse("${addr}: ", format);
-    inst_sp->Dump(&out_stream, 0, true, false, nullptr, &sc, nullptr, &format,
-                  0);
+    inst_sp->Dump(&out_stream, 0, true, false, /*show_control_flow_type*/ false,
+                  nullptr, &sc, nullptr, &format, 0);
----------------
same here

================
Comment at: lldb/source/API/SBInstructionList.cpp:169
+        inst->Dump(&sref, max_opcode_byte_size, true, false,
+                   /*show_control_flow_type*/ false, nullptr, &sc, &prev_sc,
+                   &format, 0);
----------------
ditto

================
Comment at: lldb/source/Commands/Options.td:306
+    "`InstructionControlFlowType` for a list of control flow kind. "
+    "far jump and far return often indicate syscall and return.">;
   def disassemble_options_context : Option<"context", "C">, Arg<"NumLines">,
----------------

================
Comment at: lldb/source/Commands/Options.td:1160
+    "`InstructionControlFlowType` enum has a list of control flow kind. "
+    "far jump and far return often indicate syscall and return.">;
   def thread_trace_dump_instructions_show_tsc : Option<"tsc", "t">, Group<1>,
----------------
copy the changes from above

================
Comment at: lldb/source/Core/Disassembler.cpp:575
+namespace x86 {
+enum PTI_MAP {
+  PTI_MAP_0,        /* 1-byte opcodes.           may have modrm */
----------------
add documentation mentioning what a PTI_MAP is. Try to make the documentation clear enough for anyone without knowledge of this matter. Also, mention what modrm is why it's important here.
Also, it's better if don't use the PTI acronym but instead use the full name. It'd make it easier to read.
Also, this is an enum and not a map, so better use an identifier different than a map

================
Comment at: lldb/source/Core/Disassembler.cpp:584-586
+/// Decide InstructionControlFlowType based on bytes of instruction.
+/// The insruction bytes are parsed beforehand into opcode, modrm and map byte
+/// to be passed as parameters.
----------------
let's be more specific

================
Comment at: lldb/source/Core/Disassembler.cpp:695-696
+                              Opcode m_opcode) {
+  // inst_bytes will be parsed into opcode, modrm and map bytes.
+  // These are the three values deciding instruction control flow type.
+  uint8_t inst_bytes[16];
----------------
another reference to opcode, modrm, and map bytes. I recommend writing extensive documentation in PTI_MAP and then mention here that the full documentation can be found in the PTI_MAP definition

================
Comment at: lldb/source/Core/Disassembler.cpp:703-704
+  if (m_opcode.GetOpcodeBytes() == nullptr || m_opcode.GetByteSize() <= 0) {
+    // x86_64 and i386 are the only ones that use bytes right now
+    // Else, we have ARM or MIPS, not yet implemented
+    return lldb::eInstructionControlFlowTypeUnknown;
----------------
this comment is not useful because this method is in the x86 namespace

================
Comment at: lldb/source/Core/Disassembler.cpp:710-713
+  // In most cases, opcode is the first byte of instruction (and modrm is the
+  // second byte), but some instructions has prefix to be skipped.
+  // The mapping is inspired from libipt's instruction decode logic
+  // in `src/pt_ild.c`
----------------

================
Comment at: lldb/source/Core/Disassembler.cpp:789-790
+    case 0x62:
+      if (exe_ctx->GetTargetRef().GetArchitecture().GetMachine() !=
+              llvm::Triple::ArchType::x86_64 &&
+          (inst_bytes[op_idx + 1] & 0xc0) != 0xc0) {
----------------
you use `exe_ctx->GetTargetRef().GetArchitecture().GetMachine() != llvm::Triple::ArchType::x86_64` a lot. What about creating a variable that holds this comparison?

================
Comment at: lldb/source/Core/Disassembler.cpp:847-848
+    return x86::GetControlFlowInstructionKind(exe_ctx, m_opcode);
+  } else
+    return eInstructionControlFlowTypeUnknown; // not implemented
+}
----------------
use { } in this else, because the if already used it

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128477/new/

https://reviews.llvm.org/D128477