[lldb-dev] lldb-dev Digest, Vol 125, Issue 31

Mon Oct 26 18:48:07 PDT 2020

Hi Jason,

This is brilliant! :-)
When we'll have segmented address spaces in LLDB maybe it will be possible
to add support for WebAssembly debugging. My patch (
https://reviews.llvm.org/D78801) was blocked by my inability to
manage addresses for different Wasm modules in a clean (or at least
acceptably clean) way.
Looking forward to this implementation, let me know if I can help in any
way!
Thanks!

-- Paolo Severini

On Mon, Oct 26, 2020 at 3:45 PM via lldb-dev <lldb-dev at lists.llvm.org>
wrote:

> Send lldb-dev mailing list submissions to
>         lldb-dev at lists.llvm.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
> or, via email, send a message with subject or body 'help' to
>         lldb-dev-request at lists.llvm.org
>
> You can reach the person managing the list at
>         lldb-dev-owner at lists.llvm.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of lldb-dev digest..."
>
>
> Today's Topics:
>
>    1. Re: [RFC] Segmented Address Space Support in LLDB
>       (Greg Clayton via lldb-dev)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 26 Oct 2020 15:45:02 -0700
> From: Greg Clayton via lldb-dev <lldb-dev at lists.llvm.org>
> To: Jason Molenda <jmolenda at apple.com>
> Cc: LLDB <lldb-dev at lists.llvm.org>
> Subject: Re: [lldb-dev] [RFC] Segmented Address Space Support in LLDB
> Message-ID: <EC34A86E-F2F4-441C-8B60-0CA7B6ACC4D4 at gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
>
>
> > On Oct 22, 2020, at 1:25 AM, Jason Molenda <jmolenda at apple.com> wrote:
> >
> > Hi Greg, Pavel.
> >
> > I think it's worth saying that this is very early in this project.  We
> know we're going to need the ability to track segments on addresses, but
> honestly a lot of the finer details aren't clear yet.  It's such a
> fundamental change that we wanted to start a discussion, even though I know
> it's hard to have detailed discussions still.
> >
> > In the envisioned environment, there will be a default segment, and most
> addresses will be in the default segment.  DWARF, user input (lldb
> cmdline), SB API, and clang expressions are going to be the places where
> segments are specified --- Dump methods and ProcessGDBRemote will be the
> main place where the segments are displayed/used.  There will be
> modifications to the memory read/write gdb RSP packets to include these.
> >
> > This early in the project, it's hard to tell what will be upstreamed to
> the llvm.org <http://llvm.org/> monorepo, or when.  My personal opinion
> is that we don't actually want to add segment support to llvm.org <
> http://llvm.org/> lldb at this point.  We'd be initializing every address
> object with LLDB_INVALID_SEGMENT or LLDB_DEFAULT_SEGMENT, and then testing
> that each object is initialized this way?  I don't see this actually being
> useful.
> >
> > However, changing lldb's target addresses to be strictly handled in
> terms of objects will allow us to add a segment discriminator ivar to
> Address and ProcessAddress on our local branch while this is in
> development, and minimize the places where we're diverging from the
> llvm.org <http://llvm.org/> sources.  We'll need to have local
> modifications at the places where a segment is input (DWARF, cmdline, SB
> API, compiler type) or output (Dump, ProcesssGDBRemote) and, hopefully, the
> vast majority of lldb can be unmodified.
> >
> > The proposal was written in terms of what we need to accomplish based on
> our current understanding for this project, but I think there will be a lot
> of details figured out as we get more concrete experience of how this all
> works.  And when it's appropriate to upstream to llvm.org <
> http://llvm.org/>, we'll be better prepared to discuss the tradeoffs of
> the approaches we took in extending Address/ProcessAddress to incorporate a
> segment.
> >
> > My hope is that these generic OO'ification of target addresses will not
> change lldb beyond moving off of addr_t for now.
> >
> > I included a couple of inlined comments, but I need to address more of
> yours & Pavel's notes later, I've been dealing with a few crazy things and
> am way behind on emails but didn't want to wait any longer to send
> something out.
>
> No worries! I would vote to upstream as much as possible as soon as
> possible to avoid differences and merging issues for you guys. I would
> really like to see LLDB have support for segmented address spaces. Many
> comments I made were just my thinking out loud and trying to ease the
> changes in with as little disruption as possible.
> >
> >
> >
> >> On Oct 19, 2020, at 4:11 PM, Greg Clayton via lldb-dev <
> lldb-dev at lists.llvm.org> wrote:
> >>
> >>
> >>
> >>> On Oct 19, 2020, at 2:56 PM, Jonas Devlieghere via lldb-dev <
> lldb-dev at lists.llvm.org> wrote:
> >>>
> >>> We want to support segmented address spaces in LLDB. Currently, all of
> LLDB’s external API, command line interface, and internals assume that an
> address in memory can be addressed unambiguously as an addr_t (aka
> uint64_t). To support a segmented address space we’d need to extend addr_t
> with a discriminator (an aspace_t) to uniquely identify a location in
> memory. This RFC outlines what would need to change and how we propose to
> do that.
> >>>
> >>> ### Addresses in LLDB
> >>>
> >>> Currently, LLDB has two ways of representing an address:
> >>>
> >>> - Address object. Mostly represents addresses as Section+offset for a
> binary image loaded in the Target. An Address in this form can persist
> across executions, e.g. an address breakpoint in a binary image that loads
> at a different address every execution. An Address object can represent
> memory not mapped to a binary image. Heap, stack, jitted items, will all be
> represented as the uint64_t load address of the object, and cannot persist
> across multiple executions. You must have the Target object available to
> get the current load address of an Address object in the current process
> run. Some parts of lldb do not have a Target available to them, so they
> require that the Address can be devolved to an addr_t (aka uint64_t) and
> passed in.
> >>> - The addr_t (aka uint64_t) type. Primarily used when receiving input
> (e.g. from a user on the command line) or when interacting with the
> inferior (reading/writing memory) for addresses that need not persist
> across runs. Also used when reading DWARF and in our symbol tables to
> represent file offset addresses, where the size of an Address object would
> be objectionable.
> >>>
> >>
> >> Correction: LLDB has 3 kinds of uint64_t addresses:
> >> - "file address" which are always mapped to a section + offset if put
> into a Address object. This value only makes sense to the
> lldb_private::Module that contains it. The only way to pass this around is
> as a lldb_private::Address. You can make queries on a file address using
> "image lookup --address" before you are debugging, but a single file
> address can result in multiple matches in multiple modules because each
> module might contain something at this virtual address. This object might
> be able to be converted to a "load address" if the section is loaded in
> your debug target. Since the target contains the section load list, the
> target is needed when converting between Address and addr_t objects.
> >> - "load address" which is guaranteed to be unique in a process with no
> segments. It can always be put into a lldb_private::Address object, but
> that object won't always have a section. If there is no section, it means
> the memory location maps to stack, heap, or other memory that doesn't
> reside in a object file section. This object might be able to be converted
> to a section + offset address if the address matches one of the loaded
> sections in a target. If this can be converted to a Address object that has
> a section, then it can persist across debug sessions, otherwise, not.
> >> - "host address" which is a pointer to memory in the LLDB process
> itself. Used for storing expression results and other things. You cannot
> convert this to/from a "file" or "load" address.
> >
> > Yes, good point, host memory is a third type of address that we use.
> And our symbols tables, for instance, internally represent themselves as
> uint64_t offsets into the file or section, I forget which, and we're not
> talking about changing those uint64_t style addresses.  On our project, I
> do not believe the symbol table will give us segment information.
>
> You should be able to classify symbols from the symbol table to a segment
> though right?
>
> We could add a special define for host addresses if needed.
>
> >
> >
> >
> >>
> >>> ## Proposal
> >>>
> >>> ### Address + ProcessAddress
> >>>
> >>> - The Address object gains a segment discriminator member variable.
> Everything that creates an Address will need to provide this segment
> discriminator.
> >>
> >> So an interesting thing to think about is if lldb_private::Section
> object should contain a segment identifier? If this is the case, then an
> Address object can have a Section that has a segment _and_ the Address
> object itself might have one that was set from the section as well. It
> would be good to figure out what the rules are for this case and it might
> lead to the need for an intelligent accessor that always prefers the
> section's segment if a section is available. The Address object must have
> one in case we have a pointer to memory in data and there is no section for
> this (like any heap addresses).
> >
> > I don't believe a Section in this project will have a segment.  We're
> looking purely at individual variables, primarily from debug information.
>
> So if you have a global variable, it will have a symbol right? And it will
> have debug info. Are you saying that only the debug info would have segment
> info? It seems important to be able to view a global variable without debug
> info.
>
> >
> >
> >>> - A ProcessAddress object which is a uint64_t and a segment
> discriminator as a replacement for addr_t. ProcessAddress objects would not
> persist across multiple executions. Similar to how you can create an addr_t
> from an Address+Target today, you can create a ProcessAddress given an
> Address+Target. When we pass around addr_ts today, they would be replaced
> with ProcessAddress, with the exception of symbol tables where the added
> space would be significant, and we do not believe we need segment
> discriminators today.
> >>
> >> Would SegmentedAddress be a more descriptive name here?
> >>
> >> A few things I would like to see on ProcessAddress or SegmentedAddress:
> >> - Have a segment definition that says "no segment" like
> LLDB_INVALID_SEGMENT or LLDB_NO_SEGMENT and allow these objects to be
> constructed with just a lldb::addr_t and the segment gets auto set to
> LLDB_NO_SEGMENT
> >
> >> - Any code that uses these should test if there is no segment and
> continue to do what they used to do before
> >> - like read/write memory in ProcessGDBRemote
> >
> >
> > To be honest, testing this is going to be one of the tricky things I'm
> not sure how we'll do.  we will have a default segment that addresses will
> use unless overridden, but how we spot places that *incorrectly* failed to
> initialize the segment of an Address/ProcessAddress is something we're
> going to need to figure out.
>
> Tests I can think of:
> - read function disassembly from a code segment that would have the same
> address as something from a data segment
> - read a variable from a data segment that would have the same address as
> something from a code segment
>
> It would be good to figure out where segments are going to come from. I
> would hope that some sections would be able to be mapped to certain
> segments so that we can live with a binary that has no debug info and still
> read say a global variable. I know we can do things right inside of the
> debug info since DWARF can have segment information.
>
> >
> >
> >> - Anything that dumps one of these objects should dump just like they
> used to (just a uint64_t hex representation and no other notation)
> >> - Add code that can convert a "load address" into a ProcessAddress or
> SegmentedAddress that invent the segment notation and have no changes for
> targets that don't support segmented address spaces
> >> - 0x1000 should convert to ProcessAddress where the address is 0x1000
> and segment is LLDB_INVALID_SEGMENT or LLDB_NO_SEGMENT if the process
> doesn't support segmented addresses
> >
> >
> >> - 0x1000 would return an error on conversion for processes that do
> support segmented addresses as the segment must be specified? Or should
> there be a default segment if we run into this case?
> >> - Come up with some quick way to represent segmented addresses for an
> address of 0x1000 in segment 2: ideas:
> >>   - [2]0x1000
> >>   - {2}0x1000
> >>   - 0x1000[2]
> >>   - 0x1000{2}
> >>   - {0x1000, 2}
> >
> > To be honest, we haven't thought about the UI side of this very much
> yet.  I think there will be ABI or ArchSpec style information that maps
> segment numbers to human-understandable names.  It's ABI style enumerated
> numbers - the DWARF will include a number that is passed down to the remote
> gdb stub.
>
> Should be fine to have named segments if needed, but we will need to come
> up with a way to specify a segment. We could always add new arguments to
> command line commands if needed.
> >
> >
> >>
> >>>
> >>> ### Address Only
> >>>
> >>> Extend the lldb_private::Address class to be the one representation of
> locations; including file based ones valid before running, file addresses
> resolved in a process, and process specific addresses (heap/stack/JIT code)
> that are only valid during a run. That is attractive because it would
> provide a uniform interface to any “where is something” question you would
> ask, either about symbols in files, variables in stack frames, etc.
> >>>
> >>> At present, when we resolve a Section+Offset Address to a “load
> address” we provide a Target to the resolution API.  Providing the Target
> externally makes sense because a Target knows whether the Section is
> present or not and can unambiguously return a load address.    We could
> continue that approach since the Target always holds only one process, or
> extend it to allow passing in a Process when resolving non-file backed
> addresses.  But this would make the conversion from addr_t uses to Address
> uses more difficult, since we will have to push the Target or Process into
> all the API’s that make use of just an addr_t.  Using a single Address
> class seems less attractive when you have to provide an external entity to
> make sense of it at all the use sites.
> >>>
> >>> We could improve this situation by including a Process (as a weak
> pointer) and fill that in on the boundaries where in the current code we go
> from an Address to a process specific addr_t.  That would make the
> conversion easier, but add complexity.  Since Addresses are ubiquitous, you
> won’t know what any given Address you’ve been handed actually contains.  It
> could even have been resolved for another process than the current one.
> Making Address usage-dependent in this way reduces the attractiveness of
> the solution.
> >>>
> >>> ## Approach
> >>>
> >>> Replacing all the instances of addr_t by hand would be a lot of work.
> Therefore we propose writing a clang-based tool to automate this menial
> task. The tool would update function signatures and replace uses of addr_t
> inside those functions to get the addr_t from the ProcessAddress or Address
> and return the appropriate object for functions that currently return an
> addr_t. The goal of this tool is to generate one big NFC patch. This tool
> needs not be perfect, at some point it will be more work to improve the
> tool than fixing up the remaining code by hand. After this patch LLDB would
> still not really understand address spaces but it will have everything in
> place to support them.
> >>
> >> This won't be NFC really as each location that plays with what used to
> be addr_t now must check if the segment is invalid before doing what it did
> before _and_ return an error if the segment is something valid.
> >>
> >> It might be better to look at all of the APIs that could end up using a
> plain "addr_t" and adding new APIs that take a ProcessAddress and call the
> old API if the segment is LLDB_INVALID_SEGMENT or LLDB_NO_SEGMENT, and
> return an error if the segment is valid. For example in the Process class
> we have:
> >>
> >> virtual size_t Process::DoReadMemory(lldb::addr_t vm_addr, void *buf,
> size_t size, Status &error) = 0;
> >>
> >> We could add a new overload:
> >>
> >> virtual size_t Process::DoReadMemory(ProcessAddress proc_addr, void
> *buf, size_t size, Status &error) {
> >> if (proc_addr.GetSegment() == LLDB_NO_SEGMENT)
> >>   return DoReadMemory(proc_addr.GetAddress(), but, size, error);
> >> error.SetErrorString("segmented addresses are not supported on this
> process");
> >> return 0
> >> }
> >>
> >> Then we can start modifying the locations that need to support
> segmented addresses as needed. For instance, if we were to add segmented
> address support to ProcessGDBRemote, then we would override this function
> in that class.
> >>
> >> I am not sure if slowly adding this functionality is better than
> replacing this all right away, but we can't just do a global replace
> without adding functionality or error checking IMHO.
> >>
> >>
> >>> Once all the APIs are updated, we can start working on the functional
> changes. This means actually interpreting the aspace_t values and making
> sure they don’t get dropped.
> >>>
> >>> Finally, when all this work is done and we’re happy with the approach,
> we extend the SB API with overloads for the functions that currently take
> or return addr_t . I want to do this last so we have time to iterate before
> committing to a stable interface.
> >>
> >> This might be one reason for doing the approach suggested above where
> we add new internal APIs that take a ProcessAddress and cut over to using
> them. As it would mean all of the current APIs in the lldb::SB layer would
> remain in place (they can't be removed) and would still make sense.
> >>
> >>>
> >>> ## Testing
> >>>
> >>> By splitting off the intrusive non-functional changes we are able to
> rely on the existing tests for coverage. Smaller functional changes can be
> tested in isolation, either through a unit test or a small GDB remote test.
> For end-to-end testing we can run the test suite with a modified
> debugserver that spoofs address spaces.
> >>
> >> That makes sense. ProcessGDBRemote will need to dynamically respond
> with wether it supports segmented addresses by overloading the DoReadMemory
> that takes a ProcessAddress and do the right thing.
> >>
> >> Thanks for taking this on. I hope some of the comments above help
> moving this forward.
> >>
> >> Greg
> >>
> >>>
> >>> Thanks,
> >>> Jonas
> >>>
> >>> _______________________________________________
> >>> lldb-dev mailing list
> >>> lldb-dev at lists.llvm.org
> >>> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
> >>
> >> _______________________________________________
> >> lldb-dev mailing list
> >> lldb-dev at lists.llvm.org <mailto:lldb-dev at lists.llvm.org>
> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev <
> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.llvm.org/pipermail/lldb-dev/attachments/20201026/2ed416fd/attachment.html
> >
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> lldb-dev mailing list
> lldb-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
>
>
> ------------------------------
>
> End of lldb-dev Digest, Vol 125, Issue 31
> *****************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/lldb-dev/attachments/20201026/bb84939b/attachment-0001.html>