[Lldb-commits] [PATCH] D140358: [lldb-vscode] Add support for disassembly view

Sun Jan 15 16:20:53 PST 2023

eloparco added inline comments.

================
Comment at: lldb/tools/lldb-vscode/lldb-vscode.cpp:2177
+  const auto max_instruction_size = g_vsc.target.GetMaximumOpcodeByteSize();
+  const auto bytes_offset = -instruction_offset * max_instruction_size;
+  auto start_addr = base_addr - bytes_offset;
----------------
clayborg wrote:
> Just checked out your changes, and you are still just subtracting a value from the start address and attempting to disassemble from memory which is the problem. We need to take that subtracted address, and look it up as suggested in previous code examples I posted. If you find a function to symbol, ask those objects for their instructions. and then try to use those. 
> 
> But basically for _any_ disassembly this is what I would recommend doing:
> - first resolve the "start_address" (no matter how you come up the address) that want to disassemble into a SBAddress
> - check its section. If the section is valid and contains instructions, call a function that will disassemble the address range for the section that starts at "start_address" and ends at the end of the section. We can call this "disassemble_code" as a function. More details on this below
> - If the section does not contain instructions, just read the bytes and emit a lines like:
> ```
> 0x1000 .byte 0x12
> 0x1000 .byte 0x34
> ...
> ```
> 
> Now for the disassemble_code function. We know the address range for this is in code. We then need to resolve the address passed to "disassemble_code" into a SBAddress and ask that address for a SBFunction or SBSymbol as I mentioned. Then we ask the SBFunction or SBSymbol for all instructions that they contain, and then use any instructions that fall into the range we have. If there is no SBFunction or SBSymbol, then disassemble an instruction at a time and then see if the new address will resolve to a function or symbol.
Tried my changes on a linux x86 machine and the loop `for (unsigned i = 0; i < max_instruction_size; i++) {` (L2190) takes care of the `start_address` possibly being in the middle of an instruction, so that's not a problem.  The problem I faced is that it tries to read too far from `base_addr` and the `ReadMemory()` operation returns few instructions (without reaching `base_addr`). That was not happening on my macOS M1 (arm) machine. 

To solve, I changed the loop at L2190 to
```
for (unsigned i = 0; i < bytes_offset; i++) {
    auto sb_instructions =
        _get_instructions_from_memory(start_addr + i, disassemble_bytes);
```
and if `start_addr` is in `sb_instructions` we're done and can exit the loop. That worked.

Another similar thing that can be done is to start from `start_sbaddr` as you were saying, increment the address until a valid section is found. Then call `_get_instructions_from_memory()` passing the section start.
What do you think? Delegating the disassembling to `ReadMemory()` + `GetInstructions()` looks simpler to me than to manually iterate over sections and get instructions from symbols and functions.
Is there any shortcoming I'm not seeing?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140358/new/

https://reviews.llvm.org/D140358