[Lldb-commits] [PATCH] D78801: [LLDB] Add class ProcessWasm for WebAssembly debugging

Paolo Severini via Phabricator via lldb-commits lldb-commits at lists.llvm.org
Sun Apr 26 23:57:09 PDT 2020

paolosev updated this revision to Diff 260216.
paolosev added a comment.

I am adding all the pieces to this patch to make the whole picture clearer; I thought to add a piece at the time to simplify reviews, but probably it ended up making things more obscure. I can always split this patch later and I need to refactor everything anyway.

So, the idea is to use DWARF as debug info for Wasm, as it is already supported by LLVM and Emscripten. For this we introduced some time ago the plugin classes ObjectFileWasm, SymbolVendorWasm and DynamicLoaderWasmDYLD. However, WebAssembly is peculiarly different from the native targets. When source code is compiled to Wasm, Clang produces a module that contains Wasm bytecode (a bit like it happens with Java and C#) and the DWARF info refers to this bytecode.
The Wasm module then runs in a Wasm runtime. (It is also possible to AoT-compile Wasm to native, but this is outside the scope of this patch).

Therefore, LLDB cannot debug Wasm by just controlling the inferior process, but it needs to talk with the Wasm engine to query the Wasm engine state. For example, for backtrace, only the runtime knows what is the current call stack. Hence the idea of using the gdb-remote protocol: if a Wasm engine has a GDB stub LLDB can connect to it to start a debugging session and access its state.

Wasm execution is defined in terms of a stack machine. There are no registers (besides the implicit IP) and most Wasm instructions push/pop values into/from a virtual stack. Besides the stack the other possible stores are a set of parameters and locals defined in the function, a set of global variables defined in the module and the module memory, which is separated from the code address space.

The DWARF debug info to evaluate the value of variables is defined in terms of these constructs. For example, we can have something like this in DWARF:

  0x00005a88:      DW_TAG_variable
                            DW_AT_location	(0x000006f3: 
                               [0x00000840, 0x00000850): DW_OP_WASM_location 0x0 +8, DW_OP_stack_value)
                            DW_AT_name	("xx")
                            DW_AT_type	(0x00002b17 "float")

Which says that on that address range the value of ‘xx’ can be evaluated as the content of the 8th local. Here DW_OP_WASM_location is a Wasm-specific opcode, with two args, the first defines the store (0: Local, 1: Global, 2: the operand stack) and the index in that store. In most cases the value of the variable could be retrieved from the Wasm memory instead.

So, when LLDB wants to evaluate this variable, in `DWARFExpression::Evaluate()`, it needs to know what is the current the value of the Wasm locals, or to access the memory, and for this it needs to query the Wasm engine.

This is why there are changes to DWARFExpression::Evaluate(), to support the DW_OP_WASM_location case, and this is also why I created a class that derives from ProcessGDBRemote and overrides ReadMemory() in order to query the wasm engine. Also Value::GetValueAsData() needs to be modified when the value is retrieved from Wasm memory.

`GDBRemoteCommunicationClient` needs to be extended with a few Wasm-specific query packets:

- qWasmGlobal: query the value of a Wasm global variable
- qWasmLocal: query the value of a Wasm function argument or local
- qWasmStackValue: query the value in the Wasm operand stack
- qWasmMem: read from a Wasm memory
- qWasmCallStack: retrieve the Wasm call stack.

These are all the changes we need to fully support Wasm debugging.

Why the `IWasmProcess` interface? I was not sure whether gdb-remote should be the only way to access the engine state. In the future LLDB could also use some other (and less chatty) mechanisms to communicate with a Wasm engine. I did not want to put a dependency on GDBRemote in a class like DWARFExpression or Value, which should not care about these details. Therefore, I thought that the new class WasmProcessGDBRemote could implement the IWasmProcess interface, forwarding requests through the base class ProcessGDBRemote which then send the new gdb-remote query packets. But I agree that this makes the code certainly more convoluted and quite ugly.

My initial idea was to keep all the Wasm-related code as much as possible isolated in plugin classes. Now, I guess that the next steps instead would be to refactor the code to eliminate the new classes WasmProcessGDBRemote and UnwindWasm and modify existing ProcessGDBRemote and ThreadGDBRemote instead. However, I am not sure if this is possible without touching also the base classes Process and Thread. For example, let’s consider function DWARFExpression::Evaluate(). There, when the DWARF opcode is DW_OP_WASM_location, we need to access the Wasm state.  We can get to the Process object with frame->CalculateProcess() and then can we assume the process must always be a ProcessGDBRemote if the target machine is a llvm::Triple::wasm32 and cast Process* to  ProcessGDBRemote* and then use Wasm-specific query functions added to that class? Would this pattern be acceptable, in your opinion?

PS, I am sorry for the late reply… this lockdown is making me a little unproductive… :-(

  rG LLVM Github Monorepo




-------------- next part --------------
A non-text attachment was scrubbed...
Name: D78801.260216.patch
Type: text/x-patch
Size: 26104 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/lldb-commits/attachments/20200427/05056818/attachment-0001.bin>

More information about the lldb-commits mailing list