[Lldb-commits] [PATCH] D72751: [LLDB] Add DynamicLoaderWasmDYLD plugin for WebAssembly debugging

Wed Jan 29 11:03:34 PST 2020

clayborg added a comment.

In D72751#1846385 <https://reviews.llvm.org/D72751#1846385>, @labath wrote:

> Thanks. My hopefully final question is not really for you but more like for other lldb developers (@jingham, @clayborg, etc.).
>
> Given that this plugin is now consisting of boiler plate only, I am wondering if we should not instead make it possible for this use case to work without any special plugins needed. A couple of options that come to mind are:
>
> - make the base DynamicLoader class instantiatable, and use it whenever we fail to find a specialized plugin
> - same as above, but only do that for ProcessGDBRemote instances
> - make ProcessGDBRemote call `LoadModules()` itself if no dynamic loader instance is available WDYT?

I am fine with 1 as long as we document the DynamicLoader class to say that it will call Process::LoadModules() and will be used if no specialized loader is needed for your platform. I would like to a see a solution that will work for any process plug-in and not just ProcessGDBRemote. If we change solution 3 above to say "Make lldb_private::Process call `LoadModules()` itself if no dynamic loader instance is available" then solution 3 is also fine.

> 
> 
> In D72751#1843458 <https://reviews.llvm.org/D72751#1843458>, @paolosev wrote:
> 
>> Yes, this seems to be the case with the current implementation of ObjectFileWASM. It creates the section list in `ObjectFileWasm::SetLoadAddress` which calls `Target::SetSectionLoadAddress` but the sections don't need to be fully loaded, and during `SymbolFileDWARF::CalculateAbilities(...)` `ObjectFile::ReadSectionData` is called to load the necessary data.
> 
> 
> This is correct, but I want to point out that the "load" in SetLoadAddress and in ReadSectionData have two very different meanings. The first one records the address of a section in the process memory, while the second one "load" the contents of a section into lldb memory (from whereever). The second one should work regardless of whether the first one was called. This is why you are able to inspect the debug info of an executable before actually running it.
> 
>> File addresses can uniquely identify a single section, there is no problem with this, and there is always a single code section per module. The only "weirdness" is that since DWARF code addresses for Wasm are calculated from the beginning of the Code section, not the beginning of the file, for the Code section, `Section::m_file_offset` can normally be the file offset, but `Section::m_file_addr` needs to be zero. This seems to make all DWARF-related code work, but, as Pavel said, maybe there could be places where LLDB expects the "load bias" to be the same for each section, which could cause problems?
> 
> The basic section loading machinery can handle sections which are "shuffled" around, but this is not true of everything (because this is not how typical object file formats work). Given that you only have one code section (no debug info or symbols should point into the debug sections) I think you should be mostly fine.
> 
> In fact it would be possible to organize things such that the "load bias" is a constant, if we create an additional pseudo-section for the file header (like we do for COFF) with a negative file address. The layout would them look something like this
> 
>   /------------\
>   |   header   |  file_addr = -sizeof(header)
>   |------------|
>   |   code     |  file_addr = 0
>   |------------|
>   | debug_info |  file_addr = offsetof(debug_info) - sizeof(header)
>   \------------/
> 
> 
> This would keep the code section at address zero, and after applying a load bias of `module_id | sizeof(header)`, everything would land in the right place. The reason I haven't proposed that is because that gets a bit messy, and so it seems acceptable to just do what you do now, provided it ends up working.

Yes the current approach allows anyone to load any section at any address. On Darwin systems, the DYLD shared cache will move __TEXT, __DATA, and other sections around such that all __TEXT sections from all shared libraries in the shared cache are all in the one contiguous range. The slide is different for each section, so we have some nice flexibility with being able to set the section load address individually. They will even invert the memory order sometimes where in the file we have __TEXT followed by __DATA, but in the shared cache __DATA appears at a lower address than __TEXT. We currently don't have the ability to load the same section at multiple addresses. This can happen when a shared library is loaded multiple times in memory, which we have seen on Android where a vendor will have a file that is the same as the base system, and the same exact file in loaded, albeit from different paths.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72751/new/

https://reviews.llvm.org/D72751