[Lldb-commits] [PATCH] D140358: [lldb-vscode] Add support for disassembly view

Thu Jan 5 02:32:44 PST 2023

eloparco added inline comments.

================
Comment at: lldb/tools/lldb-vscode/lldb-vscode.cpp:2177
+  const auto bytes_offset = -instruction_offset * bytes_per_instruction;
+  auto start_addr = base_addr - bytes_offset;
+  const auto disassemble_bytes = instruction_count * bytes_per_instruction;
----------------
clayborg wrote:
> This will work if the min and max opcode byte size are the same, like for arm64, the min and max are 4. This won't work for x86 or arm32 in thumb mode. So when backing up, we need to do an address lookup on the address we think we want to go to, and then adjust the starting address accordingly. Something like:
> 
> ```
> SBAddress start_sbaddr = (base_addr - bytes_offset, g_vsc.target);
> ```
> now we have a section offset address that can tell us more about what it is. We can find the SBFunction or SBSymbol for this address and use those to find the right instructions. This will allow us to correctly disassemble code bytes. 
> 
> We can also look at the section that the memory comes from and see what the section contains. If the section is data, then emit something like:
> ```
> 0x00001000 .byte 0x23
> 0x00001001 .byte 0x34
> ...
> ```
> To find the section type we can do:
> ```
> SBSection section = start_sbaddr.GetSection();
> if (section.IsValid() && section.GetSectionType() == lldb::eSectionTypeCode) {
>  // Disassemble from a valid boundary
> } else {
>   // Emit a byte or long at a time with ".byte 0xXX" or other ASM directive for binary data
> }
> ```
> We need to ensure we start disassembling on the correct instruction boundary as well as our math for "start_addr" might be in between actual opcode boundaries. If we are in a lldb::eSectionTypeCode, then we know we have instructions, and if we are not, then we can emit ".byte" or other binary data directives. So if we do have lldb::eSectionTypeCode as our section type, then we should have a function or symbol, and we can get instructions from those objects easily:
> 
> ```
> if (section.IsValid() && section.GetSectionType() == lldb::eSectionTypeCode) {
>  lldb::SBInstructionList instructions;
>  lldb::SBFunction function = start_sbaddr.GetFunction();
>  if (function.IsValid()) {
>     instructions = function.GetInstructions(g_vsc.target);
>  } else {
>     symbol = start_sbaddr.GetSymbol();
>     if (symbol.IsValid())
>       instructions = symbol.GetInstructions(g_vsc.target);
> }
> const size_t num_instrs = instructions.GetSize();
> if (num_instrs > 0) {
>   // we found instructions from a function or symbol and we need to 
>   // find the matching instruction that we want to start from by iterating
>   // over the instructions and finding the right address
>   size_t matching_idx = num_instrs; // Invalid index
>   for (size_t i=0; i<num_instrs; ++i) {
>     lldb::SBInstruction inst = instructions.GetInstructionAtIndex(i);
>     if (inst.GetAddress().GetLoadAddress(g_vsc.target) >= start_addr) {
>       matching_idx = i;
>       break;
>     }
>   }
>   if (matching_idx < num_instrs) {
>     // Now we can print the instructions from [matching_idx, num_instrs)
>     // then we need to repeat the search for the next function or symbol. 
>     // note there may be bytes between functions or symbols which we can disassemble
>     // by calling _get_instructions_from_memory(...) but we must find the next
>     // symbol or function boundary and get back on track
>   }
>   
> ```
Sorry, I should have provided a proper explanation.

I use the maximum instruction size as the "worst case". Basically, I need to read a portion of memory but I do not know the start address and the size. For the start address, if I want to read N instructions before `base_addr` I need to read at least starting from `base_addr - N * max_instruction_size`: if all instructions are of size `max_instruction_size` I will read exactly N instructions; otherwise I will read more than N instructions and prune the additional ones afterwards. Same for applies for the size.

Since `start_addr` is based on a "worst case", it may be an address in the middle of an instruction. In that case, that first instruction will be misinterpreted, but I think that is negligible.

The logic is similar to what is implemented in other VS Code extensions, like https://github.com/vadimcn/vscode-lldb/blob/master/adapter/src/debug_session.rs#L1134.

Does it make sense?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140358/new/

https://reviews.llvm.org/D140358