[Lldb-commits] [PATCH] D55356: Add a method to get the "base" file address of an object file

Wed Dec 12 02:36:02 PST 2018

On 11/12/2018 23:54, Zachary Turner wrote:
> 
> 
> On Tue, Dec 11, 2018 at 11:57 AM Pavel Labath <pavel at labath.sk 
> <mailto:pavel at labath.sk>> wrote:
> 
>     The part I know nothing about is whether something similar could be
>     done
>     for PE/COFF files (and I'll need something like that there too).
>     Adrian,
>     Zachary, what is the relation ship between "image base" of an object
>     file and its sections? Is there any way we could arrange so that the
>     base address of a module always belongs to one of its sections?
> 
> 
> Historically, an image base of N was used as a way to tell the loader 
> "map the file in so that byte 0 of the file is virtual address N in the 
> process's address space".  as in *((char *)N) would be the first byte of 
> the file in a running process.  Then, everything else in the file is 
> written as an offset from N.  This includes section addresses.  So for 
> example, if we use dumpbin on a simple executable we can see something 
> like this:
> 
> Dump of file bin\count.exe
> 
> PE signature found
> 
> File Type: EXECUTABLE IMAGE
> 
> OPTIONAL HEADER VALUES
>                    ...
>         140000000 image base (0000000140000000 to 0000000140011FFF)
>                    ...
> SECTION HEADER #1
>     .text name
>      1000 virtual address (0000000140001000 to 00000001400089AE)
> 
> So under this scheme, the first byte of the first section would be at 
> virtual address 0000000140001000 in the running process.
> 
> Later, ASLR came along and threw a wrench in all of that, and so these 
> days the image base is mostly meaningless.  The loader will probably 
> never actually load your module at the address specified in image base.  
> But the rest of the rules still hold true.  Wherever it *does* load your 
> module, the first byte of .text will still be at offset 1000 from that.
> 
> So, if you want to return this value from the PE/COFF header, or even if 
> you want to return the actual address that the module was loaded at, 
> then no, it will never belong to any section (because the bytes at that 
> address will be the PE/COFF file header).
> 
> Does this make sense?

I think it does.

I am aware that this address is not going to represent a valid address 
in target memory (the same is true for elf and macho targets), but what 
we're trying to ensure is that when we take this address, and ask the 
running target to give us the "load" address for it, it will return the 
actual place in memory (and conversely if the target is not running it 
should give us an invalid address instead of returning something bogus.

So, if I understand correctly, the PE/COFF file will always be loaded 
into one contiguous chunk of memory, ranging from ImageBase (modulo 
ASLR) to ImageBase+SizeOfImage. Then various sections are mapped into 
that range (according to their RVAs).

If that's the case, then we could model this as one big 
segment/container section/whateever, and the individual (loadable) 
sections would be sub-sections of that. Apart from solving my current 
problem, this should also improve the address lookup for these modules. 
E.g. right now if you ask lldb to lookup the address corresponding to 
the memory image of the header, it will say it does not belong anywhere, 
but that address is clearly associated with the module.

I'll try looking at what kind of changes are needed to make this happen. 
I'll start with the elf case, as I am more familiar with that (and it'll 
probably be more complicated).

thanks,
pl