[Lldb-commits] [PATCH] D78801: [LLDB] Add class ProcessWasm for WebAssembly debugging

Wed Apr 29 16:46:07 PDT 2020

clayborg added a comment.

In D78801#2009917 <https://reviews.llvm.org/D78801#2009917>, @labath wrote:

> In D78801#2009048 <https://reviews.llvm.org/D78801#2009048>, @clayborg wrote:
>
> > In D78801#2007795 <https://reviews.llvm.org/D78801#2007795>, @labath wrote:
> >
> > > While that idea has occurred to me too, I am not convinced it is a good one:
> > >
> > > - it replaces one odd dependency with another one. Why should a Process need to know how to evaluate a DWARF expression? Or even that DWARF exists for that matter? This seems totally unrelated to what other Process functions are doing currently...
> >
> >
> > But it is what people do in reality. DW_OP_low_user and DW_OP_high_user are ranges that are made available to people to customize their DWARF opcodes. If you don't handle it, you are hosed and can't show a variable location. And to make things worse, two different compilers could both use the same value in that range. So they made DWARF expressions customizable with no real attempt to make them function for different architectures. that is unless you standardize it and make a real opcode that gets accepted into DWARF. The kind of DWARF location opcode that is being used here could easily be generalized into a DW_OP_get_stack_variable with a bunch of args, but at some point you have to talk to someone that is in communication with the runtime of the thing you are debugging to get the answer. So I do believe asking the process for this is not out of scope.
>
>
> I think the "at some point" part is very important here. I get how dwarf expressions are meant to be extended, and that doing that is tricky, but I don't think that automatically means that we should delegate that to a different process. There are various ways that could be implemented and the delegation could be performed at different levels. For example the DWARFExpression class could be made into a class hierarchy and we could have a subclass of it for each architecture with funny operations. Then the subclass would have enough knowledge about wasm to properly parse the expression and evaluate it (possibly by making additional queries to someone else) -- this is sort of what the current patch does, without the "subclass" part.
>
> The problem I have with a function like `EvaluateCustomDWARFExpressionOpcode` is that it is completely unlike anything else that our process class needs to deal with. The Process deals with threads (and how to control them), memory (read/write) and, to a limited degree, with modules. It knows nothing about "stack frames" or "variables" or "dwarf expressions" -- these are concepts built on top of that. This becomes even more true if we start to talk about the gdb-remote protocol instead of the lldb Process abstraction.
>
> > 
> > 
> >> - I am not sure it even completely removes wasm knowledge from e.g. DWARFExpression -- the class would presumably still need to know how to parse this opcode.
> > 
> > It is true and this is another hole in the "people can extend DWARF easily" scenario. We need to know what opcode arguments are and that would need to be hard coded for now. But it wouldn't have to rely on anything except virtual function on the generic lldb_private::Process/Thread APIs. In this case as soon as we get an unknown opcode we would need to pass the DataExtractor and the offset into it so the process could extract the arguments.
>
> Not only that, we might need to pass in the entire DWARF stack, in case the opcode depends on some of the stack arguments.

Yes

> 
> 
>> Not clean, but better than making DWARFExpression depend on process plug-ins IMHO.
> 
> The dependence on a process plugin could be dealt with by making `GetWasmGlobal/Local/etc` a virtual function on the Process class. Also not clean, but it's not clear to me whether it's cleaner than having `EvaluateCustomDWARFExpressionOpcode` virtual function.

I would really like to see a solution that does not include "Wasm" in any process or thread virtual functions. I am fine with something like:

  enum IndexedVariableType {
   Global,
   Local,
   Stack
  };

  Process::GetIndexedVariable(IndexedVariableType type, size_t index, ...)

I would rather see a DWARF function added to process over a WASM specific one.

> Anyway, just the fact that we can't come up with a "clean" solution doesn't mean that we should accept an "unclean" one. This wouldn't be the first feature that ends up sitting in a fork somewhere because it does not integrate cleanly with llvm (probably the largest example of that is swift-lldb).

I think there is a clean way to do memory reading and writing with segment IDs and also to access variables from the process runtime.

Actually, that brings up an idea: a process runtime plug-in for languages that have a runtime that can be communicated with. Java has a runtime, Wasm has a runtime, most Javascript engines have runtimes. Maybe this wouldn't be too hard to abstract? Then we can add the GetGlobal(index), GetLocal(index), GetStack(index) to this plug-in?

> And I believe the current problems are just the tip of the iceberg. I can't imagine what hoops we'll need to jump through once we start evaluating expressions...
> 
>> 
>> 
>>> - the interface could get very complicated if we wanted to implement typed stacks present in DWARF5 -- presumably the API would need to return the type of the result, in addition to its value.
>> 
>> DWARF5 just further clarifies what each value on the opcode stack is (file address, load address, the value itself, etc). Right now DWARF expression just infer what a value is based on the opcodes. So I don't see a huge problem here as anything we do will need to work with DWARF5.
> 
> That doesn't sounds right. DWARF5 introduces opcodes like `DW_OP_deref_type` which takes a _die offset_ as an argument so that you can specify what is the type of the dereference result value you are accessing. Fortunately that offset must refer to a DW_TAG_base_type, which means the most interesting aspects are byte size and signedness (and byte size is sort of implied by the result), but that still leaves the door open to more  languages with more complicated "base" types.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78801/new/

https://reviews.llvm.org/D78801