[Lldb-commits] [PATCH] D78801: [LLDB] Add class ProcessWasm for WebAssembly debugging

Fri May 1 00:09:02 PDT 2020

paolosev added a comment.

> The thing I am resisting is to put all of this stuff in the the ProcessGDBRemote class. Instead of trying to generalize it so that it can handle everything (and generalize to the wrong thing), I very much like the idea of introducing a `WasmProcess` plugin class that handles all wasm stuff. If that class happens to use parts of gdb-remote, then so be it, but it means that it's not a problem for that class to use a dialect of the gdb-remote protocol, which includes as many wasm-specific packets as it needs. Then this class can create it's own implementation of the "Unwind" interface, which will use some WasmProcess-specific apis to undwind, but that will also be ok, since both classes will be wasm-specific.

I think that this would solve a lot of problems. `WasmProcess` could inherit from GDBRemoteProcess and could send itself Wasm-specific commands like qWasmMem just by calling `GetGDBRemote().SendPacketAndWaitForResponse()` Then there would be no changes to ProcessGDBRemote at all.
For walking the call stack I would then keep the current design with the Unwind-derived class that sends a command to get the call stack from the stub. It’s true that normally it is the client that calculates call stacks, but it does so even because it cannot really ask them to the stub, but here we have a runtime that can provide this information.

> If dwarf had a notion of address spaces then there'd probably be no need for DW_OP_WASM_location. If the dwarf committee hasn't been able to come up with a unified way to represent address spaces, then I'm not sure we will do better. And we'll still be left with the job of translating dwarf (and other) entries into this other concept.

Yes, it is possible to come up with some representation of a unified address space for Wasm locals, globals and stack items (and also code and memory). But it's also true that locals, globals and stack items don’t really have a memory location. The reason for the introduction of `DW_OP_WASM_location` is indeed because there was nothing in DWARF that could well represent these entities. While there are certainly other architectures with multiple memory spaces, the execution model of WebAssembly is quite peculiar in this aspect, I think.

> The question is whether something similar can be done for the other two cases. I believe it might be possible for the DWARFExpression. We just need to have a way to separate the creation of the dwarf expression data from the process of evaluating it. Right now, both of these things happens in SymbolFileDWARF -- it creates a DWARFExpression object which both holds the data and knows how to evaluate it. But I don't believe SymbolFileDWARF needs the second part. If we could make it so that something else is responsible for creating the evaluator for dwarf expressions, that something could create a WasmDWARFExpression which would know about DW_OP_WASM_location and WasmProcess, and could evaluate it.

Separating the DWARFExpression data and the logic to evaluate it would certainly make sense. I think that this must be a general problem: since DWARF provides the way to define custom expression location, different architectures might define specific DWARF codes that they need to handle, so it would be great if we had a `DWARFEvaluator`  that was also pluggable. I don’t know how complex would be to refactor DWARFExpression in this way but I can investigate.

> The `GetValueAsData` problem is trickier, particularly as I'm not even sure the current implementation is correct. Are you sure that you really need the _current_ module there? What happens if I use "target variable" to display a variable from a different module? What if I then dereference that variable? If the dereference should happen in the context of the other module, then I guess the "module" should be a property of the value, not of the current execution context. And it sounds like some address space-y thing would help. But that might require teaching a lot of places in lldb about address spaces, in particular that a dereferencing a value in one address space, should produce a value in the same address space (at least for "near" pointer or something).
>  If we should really use the current frame as the context, then I guess we'd need some sort of a interfaces to ask a stack frame to get the value of a "Value".

The fact that the code address space is separated from the memory address space is really what makes things complicated. However, we know for sure that every time that all memory reads made while evaluating DWARF expressions or variables always target the module memory space, never the code space.

I must confess that had not really tested `target variable` so far; I did it today and I found that it already //almost// works. What happens is that the location of the global variable is calculated with DWARFExpression::Evaluate (in my tests I only see `DW_OP_addr`, but maybe there could be other ways?) and there it calls Value::ConvertToLoadAddress() passing the correct module, and this produces an address in the form `module_id|offset`, which is then used in Value::GetValueAsData(), which currently sends qWasmMem requests.

We don’t really need the frame_index in Value:: GetValueAsData; the stub only uses frame_index to calculate the corresponding module_id, and we already pass the current Module* to this function.
So it seems possible to make this work for WebAssembly, but the tricky part is how to make Wasm-specific reads in a class like Value that should not have any knowledge about Wasm.

>> Actually, that brings up an idea: a process runtime plug-in for languages that have a runtime that can be communicated with. Java has a runtime, Wasm has a runtime, most Javascript engines have runtimes. Maybe this wouldn't be too hard to abstract? Then we can add the GetGlobal(index), GetLocal(index), GetStack(index) to this plug-in?
> 
> I think that is a very interesting idea to explore. We already have language runtime plugins, and they do have the ability to refine the type of a variable/value. Maybe they could also somehow help with computing the value of a variable? Then instead of a GetGlobal/Local(index), we could have `GetValue(Variable)`, and they would be the ones responsible for making sense of the dwarf expressions and address spaces?

Summarizing, if we assume that we can create:

- `WasmProcess`, derived from GDBRemoteProcess,
- `WasmUnwind`, derived from Unwind, and
- `WasmDWARFEvaluator`, derived from a new class DWARFEvaluator,

then we would not have to touch any existing GDBRemote- and DWARFExpression code.
At this point the way we query locals/globals/stack and the call stack from the Wasm engine would be just an implementation detail of these Wasm plugin classes (assuming that we can define new gdb-remote custom commands like `qWasmLocal`, `qWasmGlobal`, `qWasmStackValue`, `qWasmCallStack`. Then we would not really need to abstract this functionality with a generic interface, in my opinion.

What is left to decide would be “just” how to handle memory. In particular how to make Wasm-specific memory requests, targeting a particular Wasm module, from class `Value`.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78801/new/

https://reviews.llvm.org/D78801