[Lldb-commits] [PATCH] D78801: [LLDB] Add class WasmProcess for WebAssembly debugging

Paolo Severini via Phabricator via lldb-commits lldb-commits at lists.llvm.org
Thu Jun 18 03:14:27 PDT 2020


paolosev added a comment.



> What about replacing ProcessReadMemory(addr_t addr, ...) with ProcessReadMemory(Address addr, ...), or even banning the use of lldb::addr_t on everywhere except in the bowels of Process subclasses and as an interchange for getting addresses as text from users.  You might want to store lldb::addr_t's for Symbol values for space concerns, but presumably the Symbol File would know the relevant address space, so it would fix that up and always hand out Addresses.
> 
> This would be a pretty big but mostly formal change.  It would seem more natural, Address is our abstraction for addresses in a process.  Instead we have this odd mix of API's that take lldb::addr_t and some that take Address.

Before all, I really apologize for the huge delay of this reply to your last comments. I had to focus on different tasks in the last couple of weeks and I needed to find some time to understand how `Process::ReadMemory` (and `WriteMemory`) could be modified to work with `Address` objects and not with `addr_t`.
It is certainly possible, but a problem, IMHO, is that class `Address` can be several things:

- An offset in a section in a file
- An offset in a loaded section
- An absolute address in memory

If we modify ReadMemory to be:

  Process::ReadMemory(const Address& address, void *buf, size_t size, Status &error)

then there we need to convert this Address into a concrete `addr_t`, to actually read from the MemoryCache (MemoryCache::Read) or from the debuggee (ReadMemoryFromInferior).
But how do we do this conversion? If we are dealing with a file we should use `lldb::addr_t Address::GetFileAddress()`. Otherwise we should call `lldb::addr_t Address::GetLoadAddress(Target *target)` I don't know if we have enough information in Process::ReadMemory() to always be able to correctly differentiate between the two cases. And in the latter case we should rely on Process::GetTarget() returning the valid target to pass to Address::GetLoadAddress.

The question then is: should `Address` know if it needs to be interpreted as a file address or load address?  Should we have an `AddressType` field in Address?
We already have this enum that maybe we could reuse:

  enum AddressType {
    eAddressTypeInvalid = 0,
    eAddressTypeFile, /// Address is an address as found in an object or symbol file
    eAddressTypeLoad, /// Address is an address as in the current target inferior process
    eAddressTypeHost  /// Address is an address in the process that is running code
  };

Process::ReadMemory is called by many places in the LLDB code. In some of them, like `Value::GetValueAsData`, `DWARFExpression::Evaluate`, `ValueObject::GetPointeeData`, it should be fairly simple to construct the Address to read from. (And that should solve the problems for Wasm).
But in other places it is not so clear what we should do. Some code works with absolute addresses and does not need Sections (see table below). In these cases, we could rely on the Address constructor that accepts just an absolute addr_t as argument, and sets `m_section_wp` as null:

  Address(lldb::addr_t abs_addr);

The existence of this constructor already makes most of the LLDB compile even after changing function ReadMemory to use an Address as argument, in class Process and in Process-derived classes. But then class `Address` would be in many cases just a wrapper over an `addr_t`.  It seems to me that would go against the idea of using just Address as our abstraction for all addresses in a process.

This table describes where Process::ReadMemory is called and how the addresses we pass to ReadMemory originate:

| ABI\AArch64\ABIMacOSX_arm64.cpp                                          | LoadValueFromConsecutiveGPRRegisters                                    | addr_t from reg_ctx                                                             |
| ABI\AArch64\ABISysV_arm64.cpp                                            | LoadValueFromConsecutiveGPRRegisters                                    | addr_t from RegisterContext                                                     |
| ABI\ARM\ABISysV_arm.cpp                                                  | GetReturnValuePassedInMemory                                            | addr_t from RegisterContext                                                     |
| ABI\PowerPC\ABISysV_ppc64.cpp                                            | GetStructValueObject                                                    | addr_t from Register                                                            |
| Commands\CommandObjectMemory.cpp                                         | CommandObjectMemoryFind::ProcessMemoryIterator:: operator[]             | addr_t from m_base_data_address                                                 |
| DynamicLoader\MacOSX-DYLD\DynamicLoaderMacOSXDYLD.cpp                    | DoInitialImageFetch                                                     | addr_t from Process::GetImageInfoAddress                                        |
| Expression\IRExecutionUnit.cpp                                           | IRExecutionUnit::DisassembleFunction                                    | addr_t from JittedFunction::m_remote_addr                                       |
| Expression\IRMemoryMap.cpp                                               | IRMemoryMap::ReadMemory(addr_t process_address)                         | addr_t passed as argument                                                       |
| JITLoader\GDB\JITLoaderGDB.cpp                                           | ReadJITEntry                                                            | addr_t passed as argument                                                       |
| Language\CPlusPlus\LibCxx.cpp                                            | formatters::LibCxxMapIteratorSyntheticFrontEnd::Update                  | addr_t from ValueObject::GetValueAsUnsigned()                                   |
| Language\CPlusPlus\LibCxxVector.cpp                                      | formatters::LibcxxVectorBoolSyntheticFrontEnd::GetChildAtIndex          | addr_t from m_base_data_address                                                 |
| Language\ObjC\CF.cpp                                                     | formatters::CFBitVectorSummaryProvider                                  | ValueObject::GetValueAsUnsigned() => Process::ReadPointerFromMemory() => addr_t |
| Language\ObjC\NSArray.cpp (and similar classes)                          | formatters::XXX::Update                                                 | addr_t from ValueObject::GetValueAsUnsigned()                                   |
| LanguageRuntime\ObjC\AppleObjCRuntime\AppleObjCClassDescriptorV2.cpp     | ClassDescriptorV2::objc_class_t::Read (and similar functions)           | addr_t passed as argument                                                       |
| LanguageRuntime\ObjC\AppleObjCRuntime\AppleObjCRuntimeV1.cpp             | AppleObjCRuntimeV1::UpdateISAToDescriptorMapIfNeeded                    | addr_t from GetISAHashTablePointer() or from DataExtractor                      |
| LanguageRuntime\ObjC\AppleObjCRuntime\AppleObjCRuntimeV2.cpp             | AppleObjCRuntimeV2::UpdateISAToDescriptorMapDynamic                     | addr_t from Process::AllocateMemory                                             |
| LanguageRuntime\ObjC\AppleObjCTrampolineHandler\AppleObjCRuntimeV1.cpp   | AppleObjCTrampolineHandler::AppleObjCVTables::VTableRegion::SetUpRegion | addr_t from ValueObject::GetValueAsUnsigned                                     |
| LanguageRuntime\RenderScript\RenderScriptRuntime\RenderScriptRuntime.cpp | GetArgsX86 (and similar)                                                | addr_t from reg_ctx                                                             |
| ObjectFile\PECOFF\ObjectFilePECOFF.cpp                                   | ObjectFilePECOFF::ReadImageData(offset)                                 | addr_t from m_image_base, addr_t calculated from COFF header                    |
| Symbol\CompactUnwindInfo.cpp                                             | CompactUnwindInfo::ScanIndex                                            | addr_t from Section:: GetLoadBaseAddress()                                      |
| Symbol\ObjectFile.cpp                                                    | ObjectFile::ReadMemory                                                  | addr_t passed as argument                                                       |
| Symbol\ObjectFile.cpp                                                    | ObjectFile::ReadSectionData((Section, offset_t)                         | addr_t from Section::GetLoadBaseAddress                                         |
| Symbol\Type.cpp                                                          | Type::ReadFromMemory(addr_t)                                            | addr_t passed as argument                                                       |
| SystemRuntime\MacOSX\SystemRuntimeMacOSX.cpp                             | SystemRuntimeMacOSX::GetQueueNameFromThreadQAddress                     | addr_t from Process::ReadPointerFromMemory                                      |
| SystemRuntime\MacOSX\SystemRuntimeMacOSX.cpp                             | SystemRuntimeMacOSX::GetExtendedBacktraceThread (and similar fns)       | addr_t from GetThreadItemInfo                                                   |
| Target\RegisterContext.cpp                                               | RegisterContext::ReadRegisterValueFromMemory(addr_t)                    | addr_t passed as argument                                                       |
| Target\Target.cpp                                                        | Target::CreateAddressInModuleBreakpoint(addr_t)                         | addr_t passed as argument                                                       |
| Target\ThreadPlanTracer.cpp                                              | ThreadPlanAssemblyTracer::Log                                           | addr_t from reg_ctx->GetPC()                                                    |
| TypeSystem\Clang\TypeSystemClang.cpp                                     | TypeSystemClang::DumpSummary                                            | addr_t from DataExtractor                                                       |
| Utility\RegisterContextMemory.cpp                                        | RegisterContextMemory::ReadAllRegisterValues                            | addr_t from reg_data_addr passed to constructor                                 |
|

A similar analysis could be done for WriteMemory, of course. I might have forgotten something, but the point is that in most places where we call Process::ReadMemory we calculate the address as a uint64_t taken from a register, read from a DataExtractor, or passed by other functions; there is not really a Section to use to construct an Address. Not sure if this is a problem or not.
In other words... I will need some guidance on how best to make this refactoring :-)

.

>> In D78801#2026636 <https://reviews.llvm.org/D78801#2026636>, @paolosev wrote:
>> 
>>> Note that WasmDWARFEvaluator::Evaluate() not only needs to handle Wasm-specific opcodes like DW_OP_WASM_location, but might also handle in a Wasm-specific way some "standard" opcodes. For example, static variables are generally encoded with `DW_OP_addr`, but for WebAssembly the corresponding offset is the offset in the module Data section, which is mapped into an area of the Memory section when the module is loaded. So, memory reads related to DW_OP_addr refer to a different space and should be dealt differently by the runtime.
>>>  This makes `target variable *global_ptr` work when we have multiple modules., because we can retrieve the module_id from the Module associated to the DWARFExpression.
>> 
>> 
>> I am confused by this line of reasoning. For a global variable, the DW_OP_addr describes the location of the variable itself (the value of `&global_ptr`). That is what makes it possible to display the value of the pointer (`global_ptr`). Displaying the value of the pointed-to object (`*global_ptr`) is a different thing, and AFAIK it completely bypasses any dwarf expressions. Are you saying that the value of the pointer somehow inherits the "address space" of the memory where the pointer itself is located?

I wrote that badly. What I meant was that if I have different Wasm modules, and one of these modules has some static pointer, like:

  static uint8_t kBuff[3];

the corresponding DWARF data will be something like:

  0x00000026:   DW_TAG_variable
                  DW_AT_name	("kBuff")
                  DW_AT_type	(0x0000003b "uint8_t[3]")
                  ...
                  DW_AT_location	(DW_OP_addr 0x0)
                  DW_AT_linkage_name	("_ZL5kBuff")

For Wasm, this DW_OP_addr location represents an offset in the Data section, and the Data section is mapped into a region of the Memory for that module. So we can use this DWARF data to display the value of this pointer, but we need to handle this DW_OP_addr opcode in a specific way for Wasm. Then, once we have the memory location, the actual value can be found and displayed in the usual way, by reading the memory.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78801/new/

https://reviews.llvm.org/D78801





More information about the lldb-commits mailing list