[Lldb-commits] [PATCH] D72751: [LLDB] Add DynamicLoaderWasmDYLD plugin for WebAssembly debugging

Mon Jan 27 16:42:40 PST 2020

paolosev added a comment.

Comments inline...

>> In D72751#1835502 <https://reviews.llvm.org/D72751#1835502>, @paolosev wrote:
>> 
>>> The first is that we need to call `Target::SetSectionLoadAddress()` twice, from two different places. First we need to call `Target::SetSectionLoadAddress()` in `ObjectFileWasm::SetLoadAddress()`, and then again in `DynamicLoaderWasmDYLD::DidAttach()`. The reason for this seems to originate in the sequence of function calls:
>>>
>>> In `DynamicLoaderWasmDYLD::DidAttach()` we call `ProcessGDBRemote::LoadModules()` to get list of loaded modules from the remote (Wasm engine).
>>>  `ProcessGDBRemote::LoadModules()` calls, first:
>>>
>>> - `DynamicLoaderWasmDYLD::LoadModuleAtAddress()` and from there: ...
>>>
>>>   then:
>>> - `Target::SetExecutableModule() -> Target::ClearModules() -> SectionLoadList::Clear()`
>>>
>>>   So, at the end of `LoadModules()` in `DynamicLoaderWasmDYLD::DidAttach()` the SectionLoadList is empty, and we need to set it again by calling `Target::.SetSectionLoadAddress()` again. This works but the duplication is ugly; is there a way to improve this?
>> 
>> 
>> I hope so. :) This seems like a bug in `ProcessGDBRemote::LoadModules`. It seems wrong/wasteful/etc to do all this work to compute the section load addresses only to have them be thrown away by `SetExecutableModule`. Maybe all it would take is to reverse the order of these two actions, so that the load addresses persist? Can you try something like that?
> 
> I would be interested to see if this helps as well.

The problem went away with the realization that `ObjectFileWasm` type should be `eTypeSharedLibrary`, not `eTypeExecutable`.
But more generically for `ProcessGDBRemote::LoadModules` I don't know if it would be easily possible to reverse the order of the calls to `DynamicLoader::LoadModuleAtAddress` and `Target::SetExecutableModule` given that the first creates the Module which is passed to the second. Maybe refactoring `DynamicLoader::LoadModuleAtAddress` so that it calls `Target::SetExecutableModule` just after `target.GetOrCreateModule` but before `UpdateLoadedSections`, but other DynamicLoader plugins could override this method...

>> I am also worried about the fact that `SymbolFileDWARF::CalculateAbilities` requires the module to be "loaded". That shouldn't be normally required. That function does a very basic check on some sections, and this should work fine without those sections having a "load address", even if they are actually being loaded from target memory. I think this means there are some additional places where ObjectFileWasm should use `m_memory_addr` instead of something else...
> 
> So a lot of these problems might go away if the modify the ObjectFileWASM to "do the right thing" when the ObjectFile is from a live process. Object file instances know if there are from a live process because the ObjectFile members:
> 
>   lldb::ProcessWP m_process_wp;
>   const lldb::addr_t m_memory_addr;
>    
> 
> Will be filled in. So the object file doesn't need to have its sections loaded in a target in order to read section contents right? The object file can be asked to read its section data via ObjectFile::ReadSectionData(...) (2 variants).
> 
> So Pavel is correct, the sections don't need to be loaded for anything in SymbolFileDWARF::CalculateAbilities(...). The section list does need to be created and available during SymbolFileDWARF::CalculateAbilities(...). We just need to be able to get the section contents and poke around at the DWARF bits.

Yes, this seems to be the case with the current implementation of ObjectFileWASM. It creates the section list in `ObjectFileWasm::SetLoadAddress` which calls `Target::SetSectionLoadAddress` but the sections don't need to be fully loaded, and during `SymbolFileDWARF::CalculateAbilities(...)` `ObjectFile::ReadSectionData` is called to load the necessary data.

>> 
>> 
>>> The second problem is that the Code Section needs to be initialized (in `ObjectFileWasm::CreateSections()`) with `m_file_addr = m_file_offset = 0`, and not with the actual file offset of the Code section in the Wasm file. If we set `Section::m_file_addr` and `Section::m_file_offset` to the actual code offset, the DWARF info does not work correctly.
>>> 
>>> I have some doubts regarding the DWARF data generated by Clang for a Wasm target. Looking at an example, for a Wasm module that has the Code section at offset 0x57, I see this DWARF data:
>>> 
>>>   0x0000000b: DW_TAG_compile_unit
>>>                 […]
>>>                 DW_AT_low_pc (0x0000000000000000)
>>>                 DW_AT_ranges (0x00000000
>>>                    [0x00000002, 0x0000000e)
>>>                    [0x0000000f, 0x0000001a)
>>>                    [0x0000001b, 0x00000099)
>>>                    [0x0000009b, 0x0000011c))
>>> 
>>> 
>>> The documentation <https://yurydelendik.github.io/webassembly-dwarf/#pc> says that //“Wherever a code address is used in DWARF for WebAssembly, it must be the offset of an instruction relative within the Code section of the WebAssembly file.”// 
>>>  But is this correct? Shouldn't maybe code addresses be offset-ed by the file address of the Code section?
>> 
>> That's interesting. I don't think that clang is really wrong/non-conforming here, but this choice plays a rather poorly with the way lldb handles object files (and how your remote presents them). In this case special casing the code section may be fine, but you should be aware that there are places in lldb which expect that the "load bias" (the delta between file and load addresses) is the same for all sections in a module (because that's how it works elsewhere), and they may not like this. However, if you have only one code section, I think you should be mostly fine.
>> 
>> I do think that this can be handled in a better way (I'm pretty sure you don't need to zero out the file offset for instance), but we can wait with that until we resolve the first point...
> 
> So the DWARF plug-ins definitely expect the addresses in the DWARF to be file addresses where if you ask the module for its section list, it can look up this "file address" in the list, it will find a single section. If this is not the case, you will need to make a special SymbolFileDWARFWasm subclass and this will be very tricky as any address you get from the DWARF will need to be converted so that the lookup the section list will work.

File addresses can uniquely identify a single section, there is no problem with this, and there is always a single code section per module. The only "weirdness" is that since DWARF code addresses for Wasm are calculated from the beginning of the Code section, not the beginning of the file, for the Code section, `Section::m_file_offset` can normally be the file offset, but `Section::m_file_addr` needs to be zero. This seems to make all DWARF-related code work, but, as Pavel said, maybe there could be places where LLDB expects the "load bias" to be the same for each section, which could cause problems?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72751/new/

https://reviews.llvm.org/D72751