[Lldb-commits] [PATCH] D78801: [LLDB] Add class ProcessWasm for WebAssembly debugging

Tue Apr 28 16:45:50 PDT 2020

paolosev added a comment.

In D78801#2007083 <https://reviews.llvm.org/D78801#2007083>, @clayborg wrote:

> So is there memory to be read from the WASM runtime? Couldn't DW_OP_WASM_location 0x0 +8 be turned into an address that can be used to read the variable? It is also unclear what DW_OP_stack_value is used for here. The DWARF expression has no idea how many bytes to read for this value unless each virtual stack location knows how big it is? What happens if you have an array of a million items? That will not fit on the DWARF expression stack and each member would need to be read from memory?
>
> It seems like the DW_OP_WASM_location + args should result in the address of the variable being pushed into the stack and the DW_OP_stack_value should be removed. This would mean at the end of the expression the address of the variable is on the stack and LLDB will just read it using the normal memory read? Am I missing something? Are there multiple memory regions? Are variables not considered to be in memory?

`DW_OP_WASM_location 0x0 +8` is not really in memory, or more precisely, its runtime representation is an internal detail of the Wasm runtime.
WebAssembly code has a peculiar structure, see for example https://developer.mozilla.org/en-US/docs/WebAssembly/Understanding_the_text_format for more details.
Ignoring memory for a moment, there are no registers in Wasm and instead Wasm instructions read/write from/to function locals, module globals and stack operands, which can only have one of these types:

- i32: 32-bit integer
- i64: 64-bit integer
- f32: 32-bit floating point
- f64: 64-bit floating point

There is still is ongoing work in LLVM (https://reviews.llvm.org/D77353/new/#change-OJue38RNV2Gz) to define the perfect representation of these Wasm constructs in DWARF, but currently what is generated by LLVM has this format:

  DW_OP_WASM_location wasm-op index

Where:

  DW_OP_WASM_location := 0xED
  wasm-op := wasm-local | wasm-global | wasm-operand-stack

  wasm-local := 0x00 i:uleb128            (The value is located in the currently executing function’s index-th local)
  wasm-global := 0x01 i:uleb128           (The value is located in the index-th global)
  wasm-operand-stack := 0x02 i:uleb128    (The value is located in the indexth entry on the operand stack)

https://yurydelendik.github.io/webassembly-dwarf/ describes the rationale behind the addition of DW_OP_WASM_location to DWARF.

For example a function like:

  int add(int a, int b) { return a + b; }

Could be compiled to:

  (func $add (param $lhs i32) (param $rhs i32) (result i32)
    local.get $lhs
    local.get $rhs
    i32.add)

and the corresponding DWARF would describe that:

- the value of `a` can be retrieved as DW_OP_WASM_location 0 0 (first local in the function)
- the value of `b` can be retrieved as DW_OP_WASM_location 0 1 (second local in the function)

Of course DW_OP_WASM_location cannot represent the values of complex types. For a complex type like a C++ array with 1M items:

  uint8_t* p = new uint8_t[1000000];

DWARF would describe the location of the pointer `p` (for example it could be in a local) and then the debugger would find DWARF info that describes its type, it would then send a request like `qWasmLocal` to get the value from the Wasm runtime, and receive the value of p, let’s say 0x8000c000.
>From there LLDB might query to read chunks of memory starting from 0x8000c000, if the user asks to explore the content of the array.

Note that not all Wasm code requires the new location description `DW_OP_WASM_location`. In many cases locations are encoded using preexisting codes. For example when compiling without optimizations, -O0, almost all variables are encoded as a delta from the frame pointer register. But the frame pointer register itself is often defined as a DW_OP_WASM_location:

  0x00000112:   DW_TAG_subprogram
                  DW_AT_low_pc	(0x0000000000000761)
                  DW_AT_high_pc	(0x00000000000007db)
                  DW_AT_frame_base	(DW_OP_WASM_location 0x0 +4, DW_OP_stack_value)
                  DW_AT_linkage_name	("_Z10quick_sortI4NodeIyE4lessIS1_EEvPT_xT0_")
                  DW_AT_name	("quick_sort<Node<unsigned long long>, less<Node<unsigned long long> > >")
                  DW_AT_decl_file	("C:\dev\test\emscripten_tests\sort\.\sort.h")
                  DW_AT_decl_line	(45)
                  DW_AT_external	(true)

  0x0000012a:     DW_TAG_formal_parameter
                    DW_AT_location	(DW_OP_fbreg +20)
                    DW_AT_name	("array")
                    DW_AT_type	(0x000003bb "Node<unsigned long long>*")
                    …

This would also work because LLDB would send a qWasmLocal to calculate the value of the frame register.

> Why do we need to override read memory? Is there more than one address space? Can't the DWARF expression DW_OP_WASM_location + args turn into an address that normal read memory can access? Or are the virtual stacks separate and not actually in the address space? If the virtual stack slot for locals/globals and stack values always know their sizes and can provide the contents, the DW_OP_WASM_location opcode should end up creating a buffer just like DW_OP_piece does and the value will be contained in there in the DWARF expression and there is no need for the DW_OP_stack_value?

> How does normal memory reading differ from Wasm memory?

In WebAssembly the memory address space is separated from the code address space. Each Wasm modules has a ‘Code’ section with the wasm bytecode. 
A Wasm module also has one (for the moment only one) Memory, which is a linear, byte-addressable range of bytes, of a configured size.
So there are two separated address spaces for code and memory, and DWARF info refers to both: address ranges are defined as offsets from the start of the Code section in the module, while location expressions imply reading from Wasm Memory instances.

This is why we need `qWasmMem`. When GDBProcess:: ReadMemory is called during this process, it sends `"m"` packets to the Wasm engine, which may be interpreted as reads from the module Code address space. But we also need a different way to express reads from the module Memory space.

For the code address space, the idea is to use a 64-bit virtual address space, where the code of each module is located at `module_id << 32`.

  0x00000000`00000000 +------------------------------------+
                      |                                    |
                      |                                    |
                      |                                    |
  0x00000001`00000000 +------------------------------------+
                      |  code module_id 1                  |
                      |                                    |
                      .                                    .
  0x00000002`00000000 +------------------------------------+
                      |  code module_id 2                  |
                      .                                    .
  0x00000003`00000000 +------------------------------------+
                      ~                                    ~

Classes ObjectFileWasm, DynamicLoaderWasmDYLD already support this, therefore LLDB emits requests to read memory at 64 addresses so formed.

But to read from the memory instances, as said, we need a separate command, qWasmMem. This is the reason why `Value::GetValueAsData` is modified in this patch, to check if we are debugging Wasm, and in that case we want to use qWasmMem because evaluating a value we are reading from the Wasm memory address space, not from the Code address space.

The GDB-remote query extensions are currently defined in the following way:

  // Get a Wasm global value in the Wasm module specified.
  // IN : $qWasmGlobal:frame_index;index
  // OUT: $xx..xx

  // Get a Wasm local value in the stack frame specified.
  // IN : $qWasmLocal:frame_index;index
  // OUT: $xx..xx

  // Get a Wasm local from the operand stack at the index specified.
  // IN : qWasmStackValue:frame_index;index
  // OUT: $xx..xx

  // Read Wasm memory.
  // IN : $qWasmMem:frame_index;addr;len
  // OUT: $xx..xx

  // Get the current call stack.
  // IN : $qWasmCallStack
  // OUT: $xx..xxyy..yyzz..zz (A sequence of uint64_t values represented as consecutive 8-bytes blocks).

All packets contain a `frame_index`, that the runtime can use to identify the Wasm module the query refers to.
The size of the returned hex chars represent the size of the returned value. For qWasmGlobal, qWasmLocal, qWasmStackValue, currently the size can only be 4 or 8 bytes, but for qWasmMem it should match the number of bytes requested in the query.

> These three could be boiled down to a "qEvaluateCustomDWARFExpressionOpcode" packet (shorter name please!) and the args like 0x0 and +8 can be sent. The result could provide the bytes for the value?

It is absolutely true that the first three packets (qWasmGlobal, qWasmLocal, qWasmStackValue) could be condensed in a single packet with an additional argument that describes the type of store.

>> qWasmCallStack: retrieve the Wasm call stack.
> 
> Seems like this packet doesn't need to be Wasm specific. Are there any other GDB remote packets that fetch stack traces already that we would re-use?

For qWasmCallStack, I could not find in the GDBRemote protocol (https://sourceware.org/gdb/current/onlinedocs/gdb/General-Query-Packets.html#General-Query-Packets) an existing command to query a thread call stack.

> A new virtual function in lldb_private::Process like:
> 
>   class Process {
>     virtual Error EvaluateCustomDWARFExpressionOpcode(uint16_t opcode, uint64_t arg1, uint64_t arg2) {
>       return createStringError(std::errc::invalid_argument, "unhandled DWARF expression opcode");
>     }
> 
> 
> could be added, and then the ProcessGDBRemote can pass this along to the GDB server. Anything in DWARFExpression needs to _only_ call virtual functions on lldb_private::Process/Thread/StackFrame and no deps should be added on custom plug-ins.

Having EvaluateCustomDWARFExpressionOpcode could work for DWARFExpression::Evaluate, with the drawbacks mentioned by @labath; but it would not help with Value::GetValueAsData(), I am afraid.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78801/new/

https://reviews.llvm.org/D78801