[lldb-dev] [RFC] Segmented Address Space Support in LLDB

Mon Nov 2 07:54:31 PST 2020

On 22/10/2020 10:25, Jason Molenda wrote:
> Hi Greg, Pavel.
> 
> I think it's worth saying that this is very early in this project.  We know we're going to need the ability to track segments on addresses, but honestly a lot of the finer details aren't clear yet.  It's such a fundamental change that we wanted to start a discussion, even though I know it's hard to have detailed discussions still.
> 
> In the envisioned environment, there will be a default segment, and most addresses will be in the default segment.  DWARF, user input (lldb cmdline), SB API, and clang expressions are going to be the places where segments are specified --- Dump methods and ProcessGDBRemote will be the main place where the segments are displayed/used.  There will be modifications to the memory read/write gdb RSP packets to include these.
> 
> This early in the project, it's hard to tell what will be upstreamed to the llvm.org monorepo, or when.  My personal opinion is that we don't actually want to add segment support to llvm.org lldb at this point.  We'd be initializing every address object with LLDB_INVALID_SEGMENT or LLDB_DEFAULT_SEGMENT, and then testing that each object is initialized this way?  I don't see this actually being useful.
> 
> However, changing lldb's target addresses to be strictly handled in terms of objects will allow us to add a segment discriminator ivar to Address and ProcessAddress on our local branch while this is in development, and minimize the places where we're diverging from the llvm.org sources.  We'll need to have local modifications at the places where a segment is input (DWARF, cmdline, SB API, compiler type) or output (Dump, ProcesssGDBRemote) and, hopefully, the vast majority of lldb can be unmodified.
> 
> The proposal was written in terms of what we need to accomplish based on our current understanding for this project, but I think there will be a lot of details figured out as we get more concrete experience of how this all works.  And when it's appropriate to upstream to llvm.org, we'll be better prepared to discuss the tradeoffs of the approaches we took in extending Address/ProcessAddress to incorporate a segment.
> 
> My hope is that these generic OO'ification of target addresses will not change lldb beyond moving off of addr_t for now.
I think that wrapping addr_t inside a class would be a nice change, even 
without the subsequent segmentification -- I'm hoping that this would 
add some type safety to the way we work with addresses (as we have 
various kinds of addresses that are all just plain ints). I'd like to 
see a concrete proposal for this class's interface though. (And I still 
remain mildly sceptical about automating this transition.)

> To be honest, we haven't thought about the UI side of this very much yet.  I think there will be ABI or ArchSpec style information that maps segment numbers to human-understandable names.

The details of this are pretty interesting for the Wasm use case, as it 
does not have a fixed number of segments/address spaces -- every module 
gets its own address space. I suppose the Wasm ArchSpec could just say 
it has UINT32_MAX address spaces, and then the dynamic loader would just 
assign modules into address spaces based on some key.

The interesting aspect here would be that the DWARF does *not* contain 
address space information here (as it's all in the same address space), 
so there may need to be a way for it to say "I don't actually know my 
address space -- I'll go whereever the dynamic loader puts me".

Still pretty early to determine that, but I'm mentioning this as it is 
the last use case of someone needing address space support in lldb (even 
though it's a slightly stranger form of address spaces).

pl