[lldb-dev] [RFC] Segmented Address Space Support in LLDB

Mon Oct 19 14:56:16 PDT 2020

We want to support segmented address spaces in LLDB. Currently, all of
LLDB’s external API, command line interface, and internals assume that an
address in memory can be addressed unambiguously as an addr_t (aka
uint64_t). To support a segmented address space we’d need to extend addr_t
with a discriminator (an aspace_t) to uniquely identify a location in
memory. This RFC outlines what would need to change and how we propose to
do that.

### Addresses in LLDB

Currently, LLDB has two ways of representing an address:

 - Address object. Mostly represents addresses as Section+offset for a
binary image loaded in the Target. An Address in this form can persist
across executions, e.g. an address breakpoint in a binary image that loads
at a different address every execution. An Address object can represent
memory not mapped to a binary image. Heap, stack, jitted items, will all be
represented as the uint64_t load address of the object, and cannot persist
across multiple executions. You must have the Target object available to
get the current load address of an Address object in the current process
run. Some parts of lldb do not have a Target available to them, so they
require that the Address can be devolved to an addr_t (aka uint64_t) and
passed in.
 - The addr_t (aka uint64_t) type. Primarily used when receiving input
(e.g. from a user on the command line) or when interacting with the
inferior (reading/writing memory) for addresses that need not persist
across runs. Also used when reading DWARF and in our symbol tables to
represent file offset addresses, where the size of an Address object would
be objectionable.

## Proposal

### Address + ProcessAddress

 - The Address object gains a segment discriminator member variable.
Everything that creates an Address will need to provide this segment
discriminator.
 - A ProcessAddress object which is a uint64_t and a segment discriminator
as a replacement for addr_t. ProcessAddress objects would not persist
across multiple executions. Similar to how you can create an addr_t from an
Address+Target today, you can create a ProcessAddress given an
Address+Target. When we pass around addr_ts today, they would be replaced
with ProcessAddress, with the exception of symbol tables where the added
space would be significant, and we do not believe we need segment
discriminators today.

### Address Only

Extend the lldb_private::Address class to be the one representation of
locations; including file based ones valid before running, file addresses
resolved in a process, and process specific addresses (heap/stack/JIT code)
that are only valid during a run. That is attractive because it would
provide a uniform interface to any “where is something” question you would
ask, either about symbols in files, variables in stack frames, etc.

At present, when we resolve a Section+Offset Address to a “load address” we
provide a Target to the resolution API.  Providing the Target externally
makes sense because a Target knows whether the Section is present or not
and can unambiguously return a load address.    We could continue that
approach since the Target always holds only one process, or extend it to
allow passing in a Process when resolving non-file backed addresses.  But
this would make the conversion from addr_t uses to Address uses more
difficult, since we will have to push the Target or Process into all the
API’s that make use of just an addr_t.  Using a single Address class seems
less attractive when you have to provide an external entity to make sense
of it at all the use sites.

We could improve this situation by including a Process (as a weak pointer)
and fill that in on the boundaries where in the current code we go from an
Address to a process specific addr_t.  That would make the conversion
easier, but add complexity.  Since Addresses are ubiquitous, you won’t know
what any given Address you’ve been handed actually contains.  It could even
have been resolved for another process than the current one.  Making
Address usage-dependent in this way reduces the attractiveness of the
solution.

## Approach

Replacing all the instances of addr_t by hand would be a lot of work.
Therefore we propose writing a clang-based tool to automate this menial
task. The tool would update function signatures and replace uses of addr_t
inside those functions to get the addr_t from the ProcessAddress or Address
and return the appropriate object for functions that currently return an
addr_t. The goal of this tool is to generate one big NFC patch. This tool
needs not be perfect, at some point it will be more work to improve the
tool than fixing up the remaining code by hand. After this patch LLDB would
still not really understand address spaces but it will have everything in
place to support them.

Once all the APIs are updated, we can start working on the functional
changes. This means actually interpreting the aspace_t values and making
sure they don’t get dropped.

Finally, when all this work is done and we’re happy with the approach, we
extend the SB API with overloads for the functions that currently take or
return addr_t . I want to do this last so we have time to iterate before
committing to a stable interface.

## Testing

By splitting off the intrusive non-functional changes we are able to rely
on the existing tests for coverage. Smaller functional changes can be
tested in isolation, either through a unit test or a small GDB remote test.
For end-to-end testing we can run the test suite with a modified
debugserver that spoofs address spaces.

Thanks,
Jonas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20201019/10895e0e/attachment.html>