[Lldb-commits] [PATCH] D140358: [lldb-vscode] Add support for disassembly view

Greg Clayton via Phabricator via lldb-commits lldb-commits at lists.llvm.org
Thu Jan 5 15:37:31 PST 2023


clayborg added inline comments.


================
Comment at: lldb/tools/lldb-vscode/lldb-vscode.cpp:2177
+  const auto bytes_offset = -instruction_offset * bytes_per_instruction;
+  auto start_addr = base_addr - bytes_offset;
+  const auto disassemble_bytes = instruction_count * bytes_per_instruction;
----------------
eloparco wrote:
> clayborg wrote:
> > This will work if the min and max opcode byte size are the same, like for arm64, the min and max are 4. This won't work for x86 or arm32 in thumb mode. So when backing up, we need to do an address lookup on the address we think we want to go to, and then adjust the starting address accordingly. Something like:
> > 
> > ```
> > SBAddress start_sbaddr = (base_addr - bytes_offset, g_vsc.target);
> > ```
> > now we have a section offset address that can tell us more about what it is. We can find the SBFunction or SBSymbol for this address and use those to find the right instructions. This will allow us to correctly disassemble code bytes. 
> > 
> > We can also look at the section that the memory comes from and see what the section contains. If the section is data, then emit something like:
> > ```
> > 0x00001000 .byte 0x23
> > 0x00001001 .byte 0x34
> > ...
> > ```
> > To find the section type we can do:
> > ```
> > SBSection section = start_sbaddr.GetSection();
> > if (section.IsValid() && section.GetSectionType() == lldb::eSectionTypeCode) {
> >  // Disassemble from a valid boundary
> > } else {
> >   // Emit a byte or long at a time with ".byte 0xXX" or other ASM directive for binary data
> > }
> > ```
> > We need to ensure we start disassembling on the correct instruction boundary as well as our math for "start_addr" might be in between actual opcode boundaries. If we are in a lldb::eSectionTypeCode, then we know we have instructions, and if we are not, then we can emit ".byte" or other binary data directives. So if we do have lldb::eSectionTypeCode as our section type, then we should have a function or symbol, and we can get instructions from those objects easily:
> > 
> > ```
> > if (section.IsValid() && section.GetSectionType() == lldb::eSectionTypeCode) {
> >  lldb::SBInstructionList instructions;
> >  lldb::SBFunction function = start_sbaddr.GetFunction();
> >  if (function.IsValid()) {
> >     instructions = function.GetInstructions(g_vsc.target);
> >  } else {
> >     symbol = start_sbaddr.GetSymbol();
> >     if (symbol.IsValid())
> >       instructions = symbol.GetInstructions(g_vsc.target);
> > }
> > const size_t num_instrs = instructions.GetSize();
> > if (num_instrs > 0) {
> >   // we found instructions from a function or symbol and we need to 
> >   // find the matching instruction that we want to start from by iterating
> >   // over the instructions and finding the right address
> >   size_t matching_idx = num_instrs; // Invalid index
> >   for (size_t i=0; i<num_instrs; ++i) {
> >     lldb::SBInstruction inst = instructions.GetInstructionAtIndex(i);
> >     if (inst.GetAddress().GetLoadAddress(g_vsc.target) >= start_addr) {
> >       matching_idx = i;
> >       break;
> >     }
> >   }
> >   if (matching_idx < num_instrs) {
> >     // Now we can print the instructions from [matching_idx, num_instrs)
> >     // then we need to repeat the search for the next function or symbol. 
> >     // note there may be bytes between functions or symbols which we can disassemble
> >     // by calling _get_instructions_from_memory(...) but we must find the next
> >     // symbol or function boundary and get back on track
> >   }
> >   
> > ```
> Sorry, I should have provided a proper explanation.
> 
> I use the maximum instruction size as the "worst case". Basically, I need to read a portion of memory but I do not know the start address and the size. For the start address, if I want to read N instructions before `base_addr` I need to read at least starting from `base_addr - N * max_instruction_size`: if all instructions are of size `max_instruction_size` I will read exactly N instructions; otherwise I will read more than N instructions and prune the additional ones afterwards. Same for applies for the size.
> 
> Since `start_addr` is based on a "worst case", it may be an address in the middle of an instruction. In that case, that first instruction will be misinterpreted, but I think that is negligible.
> 
> The logic is similar to what is implemented in other VS Code extensions, like https://github.com/vadimcn/vscode-lldb/blob/master/adapter/src/debug_session.rs#L1134.
> 
> Does it make sense?
The issue is, you might end up backing up by N bytes, and you might not end up on an opcode boundary. Lets say you have x86 disassembly like:

```
    0x100002e37 <+183>: 48 8b 4d f8              movq   -0x8(%rbp), %rcx
    0x100002e3b <+187>: 48 39 c8                 cmpq   %rcx, %rax
    0x100002e3e <+190>: 0f 85 09 00 00 00        jne    0x100002e4d               ; <+205> at main.cpp
    0x100002e44 <+196>: 8b 45 94                 movl   -0x6c(%rbp), %eax
    0x100002e47 <+199>: 48 83 c4 70              addq   $0x70, %rsp
    0x100002e4b <+203>: 5d                       popq   %rbp
    0x100002e4c <+204>: c3                       retq   
    0x100002e4d <+205>: e8 7e 0f 00 00           callq  0x100003dd0               ; symbol stub for: __stack_chk_fail
    0x100002e52 <+210>: 0f 0b                    ud2    
```

Let's say you started with the address 0x100002e4c, and backed up by the max opcode size of 15 for x86_64, that would be 0x100002e3d. You would start disassembling on a non opcode boundary as this is the last byte of the 3 byte opcode at 0x100002e3b (0xc8). And this would disassembly incorrectly. So we need  to make use of the functions or symbol boundaries to ensure we are disassembling correctly. If we have no function or symbol, we can do our best. But as you can see we would get things completely wrong in this case and we need to fix this as detailed.


================
Comment at: lldb/tools/lldb-vscode/lldb-vscode.cpp:2194
+  if (index == sb_instructions.size() + 1) {
+    fprintf(stderr, "current line not found in disassembled instructions\n");
+    return response_instructions;
----------------
eloparco wrote:
> clayborg wrote:
> > Remove any and all printf, or fprintf statements. You can't print anything to stderr or stdout as this is where the DAP packets are get emitted to. We do make it so this won't affect lldb-vscode by doing some magic with the STDOUT/STDERR file handles, but this output will be sent to /dev/null most likely. You can print something to a console (using "g_vsc.SendOutput(...)" is one way).
> I suppose I have to replace `llvm::errs()` too, right?
yes! the main issue is, will the user expect to see this output in the console and will it make sense to the user. I don't know what the user will think if they see "current line not found in disassembled instructions" in the debug console. That goes for all output to the console. It will have to make sense to the user. I don't know if the user will care and or be able to do anything about this message. It also isn't prefixed with a "warning:" or "error:". I would vote to remove it.



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140358/new/

https://reviews.llvm.org/D140358



More information about the lldb-commits mailing list