[Lldb-commits] [PATCH] D71575: [LLDB] Add ObjectFileWasm plugin for WebAssembly debugging

Thu Jan 9 03:59:27 PST 2020

labath added inline comments.

================
Comment at: lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp:342-343
+                        section_type,                  // Section type.
+                        sect_info.offset & 0xffffffff, // VM address.
+                        sect_info.size, // VM size in bytes of this section.
+                        sect_info.offset &
----------------
paolosev wrote:
> labath wrote:
> > paolosev wrote:
> > > labath wrote:
> > > > Are the debug info sections actually loaded into memory for wasm? Should these be zero (that's what they are for elf)?
> > > Yes, I was thinking that the debug sections should be loaded into memory; not sure how this works for ELF, how does the debugger find the debug info in that case?
> > Are you referring to the load-from-memory, or load-from-file scenario? Normally, the debug info is not loaded into memory, and if lldb is loading the module from a file, then the debug info is loaded from the file (and we use the "file offset" field to locate them). Loading files from memory does not work in general (because we can't find the whole file there -- e.g., section headers are missing). The only case it does work is in case of jitted files. I'm not 100% sure how that works, but given that there is no `if(memory)` block in the section creation code, those sections must too get `vm size = 0`.
> > 
> > In practice, I don't think it does not matter that much what you put here -- I expect things will mostly work regardless. I am just trying to make this consistent in some way. If these sections are not found in the memory at the address which you set here (possibly adjusted by the load bias) then I think it makes sense to set `vm_size=vm_addr=0`. If they are indeed present there, then setting it to those values is perfectly fine.
> Modified, but I am not sure I completely understood the difference of file addresses and vm addresses.
> 
> In the case of Wasm, we can have two cases:
> a) Wasm module contains executable code and also DWARF data embedded in DWARF sections
> b) Wasm module contains code and has a custom section that points to a separated Wasm module that only contains DWARF sections
> 
> The file of Wasm modules that contains code should never be loaded directly by LLDB; LLDB should use GDB-remote to connect to the Wasm engine that loaded the module and retrieve the module content from the engine.
> But when the DWARF data has been stripped into a separate Wasm file, that file should be directly loaded by LLDB in order to load the debug sections.
> So, if I am not wrong, only in the first case we should assume that the debug info is loaded into memory, and have vm_size != 0?
You're right, this is getting pretty confusing, as a lot of the concepts we're talking about here are overloaded (and not even consistently used within lldb). Let me try to elaborate here.

I'll start by describing various Section fields (I'll use ELF as a reference):
- file offset (`m_file_offset`) - where the section is physically located in the file. It does not matter if the file is loaded from inferior memory or not (in the former case, this is an offset from location of the first byte of the file). In elf this corresponds to the `sh_offset` field.
- file size (`m_file_size`) - size of the section in the file. Some sections don't take up any space in the file (`.bss`), so they have this zero. This is roughly the `sh_size` field in elf (but it is adjusted for SHT_NOBITS sections like .bss)
- file address (`m_file_addr`) - The address where the object file says this section should be loaded to. Note that this may not be the final address due to ASLR or such. It is the job of the SetLoadAddress function to compute the actual final address (which is then called the "load address"). This corresponds to the `sh_addr` field in elf, but it is also sometimes called the "vm address", because of the `p_vaddr` program header field
- vm size (`m_byte_size) size of the section when it gets loaded into memory. Sections which don't get loaded into memory have this as 0. This is also "rougly" `sh_size`, but only for SHF_ALLOC sections.

All of these fields are really geared for the case where lldb opens a file from disk, and then uses that to understand the contents of process memory. They become pretty redundant if you have loaded the file from memory, but it should still be possible to assign meaningful values to each of them.

The file offset+file size combo should reflect the locations of the sections within the file (regardless of whether it's a "real" file or just a chunk of memory). And the file address+vm size combo should reflect the locations of the sections in memory (modulo ASLR, and again independently of where lldb happened to read the file from).

Looking at the patch, I think you've got most of these right. The only question is, what is the actual memory layout of a wasm debug info in the inferior process. I originally highlighted this because I was assuming the typical elf model where the debug info is not loaded into memory and the debugger loads it from a file. (jitted files are an afterthought in elf, and not that well supported/tested). However, now I get the impression that the wasm debug info is actually always loaded into memory (assuming it is present, that is), and so setting the vm size to non zero might actually be correct here. You'll need to make the call there, as I don't know how wasm actually works. (Note that you needn't concern yourself much with separate debug files here -- the vm size only really matters once you call SetLoadAddress to actually mark the object file as loaded into the process, and we never call SetLoadAddress on a separate debug file).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71575/new/

https://reviews.llvm.org/D71575