[lldb-dev] Listing memory regions in lldb

Mon May 16 15:06:19 PDT 2016

I am fine with adding new key/value pairs to each memory region so feel free to extend as needed as long as any missing keys ("memory_type == stack" for example) default to something sensible. Then each process plug-in can do what it can do and the process plug-ins will fill in as much info as possible. More comments below.

> On May 13, 2016, at 3:35 AM, Howard Hellyer <HHELLYER at uk.ibm.com> wrote:
> 
> I have experimented with the GetMemoryRegionInfo approach on the internal APIs already, it has some positives and negatives. 
> 
> - GetMemoryRegionInfo is unimplemented for Linux and Mac core dumps. That's not necessarily a bad thing as it could be implemented the "right" way. (As Jim said GetMemoryRegionInfo would have to return the right thing for an unmapped region.) Interestingly when I've worked on Windows core dumps before I've seen that MiniDump, with the right flags, will deliberately insert a region in the middle of the memory ranges to represent the unmapped space, on 64 bit it's quite a large section. 

this is very easy to implement as the there are just sections that say "these bytes are for [0x100000000-0x100001000)". I know Mach-o core files on Mac can get the read/write/execute permissions. You can figure out the gaps very easily as well in mach-o. So this shouldn't be a problem for core files. If the core file is smart enough to say "here is the full map of what was in memory, but the sections only describe part of the actual memory", then we should be able to work with this still.
> 
> - Using GetMemoryRegionInfo to iterate might be quite expensive if there are many small memory regions. 

Remember that the other use for GetMemoryRegionInfo() might be a user just asking about an address they have in a register or variable that is a pointer. So if we do add a more complex iteration scheme, feel free to do so, but please leave the call that ask about a single address intact so it can be used by clients

> - One reason I hadn't looked at just exposing an SBGetMemoryRegionInfo is that it wouldn't match a lot of the SB API's at the moment (for example for Threads and Sections) which tend work by having GetNumXs() and GetXAtIndex() calls. Iterating with SBGetMemoryRegionInfo would be non-obvious to the user of the API since it doesn't match the convention. It would need to be documented in the SBGetMemoryRegionInfo API. 

I don't think that the design of SBMemoryRegionInfo is affected by which approach we take. If we can ask about a single address with:

SBError 
SBTarget::GetMemoryRegionInfo(lldb::addr_t load_addr, SBMemoryRegionInfo &region_info);

Or if we have:

uint32_t
SBTarget::GetNumMemoryRegions();

bool
SBTarget::GetMemoryRegionAtIndex(uint32_t idx, SBMemoryRegionInfo &region_info);

It still doesn't change SBMemoryRegionInfo's definition. I don't like the above two calls because if you do something bad like:

const uint32_t num_regions = target.GetNumMemoryRegions();
target.GetProcess().Continue();
...
SBMemoryRegionInfo region_info;
for (uint32_t i=0; i<num_regions; ++i)
{
    if (target.GetMemoryRegionAtIndex(i, region_info))
        //....
}

you don't know if num_regions is valid anymore since the process was continued. So any new API should actually do something like:

SBMemoryRegionInfoList SBTarget::GetMemoryRegions();

Then then we have:

uint32_t SBMemoryRegionInfoList::GetSize();
bool SBMemoryRegionInfoList::GetRegionAtIndex(uint32_t idx, SBMemoryRegionInfo &region_info);

The data would have been cached at the time of the SBTarget::GetMemoryRegions() call and will be contain valid data for when it was called.

> 
> - I've found the only way to reliably filter out inaccessible memory is to do a test read and check the error return code. I'm pretty sure I've seen some that are readable via a memory read but marked read=false, write=false, execute=false. (I can't remember the exact case now off the top of my head, but I think it might have been allocated but uncommitted memory or similar.) 

MacOSX knows this, but each process plug-in can do what it needs to in order to determine this.

> 
> - Using GetMemoryRegionInfo over a remote connection on Linux and Mac worked well but seemed to coalesce some of the memory regions together.

You might want to add an extra parameter to not coalesce regions. Or if we start adding names to the memory regions (".text", ".data") or types (stack, heap, section from a file, guard page) then we might just start not coalescing the regions so we can see these differences. Or we can add more options to the API:

SBMemoryRegionInfoList SBTarget::GetMemoryRegions(bool coalesce);

> It also only allows for read/write/exec attributes to be passed. That's a shame as a live process can often tell you more about what the region is for. The remote command memory map looks like it sends back XML so it might be possible to insert custom properties in there to give more information but I'm not sure how safe it is to do that, I don't know the project quite well enough to understand all the use cases for the protocol. 

We can easily add names to regions and it would be ok for a region to not have a name. We can also add types as a enumeration (stack, heap, section from a file, guard page).
> 
> - Extended infomation /proc/pid/maps marking a region as [stack] would be lost. All you would get is read/write/exec info. (That said supporting everything every platform can provide might be a bit much.) 
> 
> - LLDB's ReadMemory implementations seem to return 0's for missing memory that should be accessible. It might be nice to include whether the memory is backed by real data or not in the API. (For example Linux core files can be missing huge pages depending on the setting of /proc/PID/coredump_filter or files can simply be truncated.) 
> 
> I could implement the GetMemoryRegionInfo iteration mechanism pretty quickly and it would actually fit my purposes as far as providing all the addresses you can sensibly access. 
> 
> I'm quite keen to provide a patch but don't want to provide one that is at odds with how the rest of lldb works or provides data that's only useful to me so I'm quite keen to get a bit of feedback on what the preferred approach would be. It could be that providing both SBGetMemoryRegionInfo and the SBGetNumMemoryRegions/SBGetMemoryRegionAtIndex pair is the right solution. 
> 
> Would a patch also need to provide a command to dump this information as it can be awkward to have data that's only accessible via the API? 
> 
> Howard Hellyer
> IBM Runtime Technologies, IBM Systems	

Hopefully my comments have provided some insight. Let me know what you come up with.

Greg Clayton