[Lldb-commits] [PATCH] D55356: Add a method to get the "base" file address of an object file

Tue Dec 11 11:39:57 PST 2018

Sections can have parents.  In MachO the text and data sections are actually contained in the TEXT and DATA segments respectively.  LLDB represents this by having an lldb_private::Section for the segment, and then all the sections in that segment are children of the parent Section (the MachO segment).  All the code that looks up addresses expects to potentially traverse this hierarchy.

It seems to me that what you are describing should fit in the nesting model?

Jim

> On Dec 11, 2018, at 11:31 AM, Pavel Labath <pavel at labath.sk> wrote:
> 
> On 11/12/2018 20:10, Jim Ingham via Phabricator wrote:
>> jingham added a comment.
>> In D55356#1327280 <https://reviews.llvm.org/D55356#1327280>, @clayborg wrote:
>>> In D55356#1327242 <https://reviews.llvm.org/D55356#1327242>, @labath wrote:
>>> 
>>>> In D55356#1327224 <https://reviews.llvm.org/D55356#1327224>, @clayborg wrote:
>>>> 
>>>>> In D55356#1327099 <https://reviews.llvm.org/D55356#1327099>, @labath wrote:
>>>>> 
>>>>>> Actually, this now causes an lldb-mi test to fail, but it's not clear to me if the problem is in the test, or this patch. This issue happens when lldb-mi is printing the "library loaded" message after a module gets added to a not-yet-running target. It tries to print the load address by first getting the base address and then converting that to a load address.
>>>>>> 
>>>>>> Before this patch, that would always fail, because well.. ELF and PECOFF had this function unimplemented, and for MachO the base address was section-relative, and so it wasn't resolved to a load address without the section being loaded. However, with this patch, in the ELF (and presumably PECOFF) case, the load address is not section-relative and so the `GetLoadAddress` function happily returns the address.
>>>>>> 
>>>>>> Is this the expected behavior here? (i.e., object_file->GetLoadAddress().GetLoadAddress(target) returning a valid value even though the target is not running)
>>>>> 
>>>>> 
>>>>> Not unless someone has manually set the section load address in the test?
>>>> 
>>>> 
>>>> The test is not setting the load address manually. This simply falls out of how `Address::GetLoadAddress`  is implemented:
>>>> 
>>>>   addr_t Address::GetLoadAddress(Target *target) const {
>>>>     SectionSP section_sp(GetSection());
>>>>     if (section_sp) {
>>>>       ...
>>>>     } else {
>>>>       // We don't have a section so the offset is the load address
>>>>       return m_offset;
>>>>     }
>>>>   }
>>>> 
>>>> 
>>>> So, where's the bug here? It's not clear to me how to change `Address::GetLoadAddress` to do something different than what it is doing now.
>>> 
>>> 
>>> So this is a bug really. When there is no section, we should specify what the m_offset is using lldb_private::AddressType in maybe a new ivar name "m_offset_type". Then GetBaseAddress() would specify eAddressTypeFile. And the above code would become:
>>> 
>>>  
>>> addr_t Address::GetLoadAddress(Target *target) const {
>>> 
>>>   SectionSP section_sp(GetSection());
>>>   if (section_sp) {
>>>     ...
>>>   } else if (m_offset_type == eAddressTypeLoad) {
>>>     // We don't have a section so the offset is the load address
>>>     return m_offset;
>>>   }
>>> 
>>> }
>>> 
>>>   We just need to be careful and see if we can not make lldb_private::Address get any bigger byte size wise if we can at all avoid it.
>> I must be missing something.  If there's a file around so that we can express this address relative to the file, why would it ever not be expressed as a section + offset?  If there's not a file around, then what does it mean to say this address ie eAddressTypeFile but we don't know the file?
> 
> I think the issue here is the difference in how elf and MachO files are loaded. Elf has this strange dual view of the file, where the same data is represented both as "sections", and "segments". Sections are the thing we all know (.text, .data, .debug_info, etc.), and are used by most tools (lldb, linker, ...). However, this is *not* what the kernel uses when it loads a binary into memory. The loader uses "segments" instead.
> 
> It is segments who describe where will a piece of file be loaded into memory. Normally, segments span one or more sections, but this is not a requirement. And almost always segments will contain some additional data besides section content. This data is generally "junk" but it is there because it enables the linker to "pack" the elf file more efficiently.
> 
> A typical elf file might look something like
> -------------------
> | elf header      |
> -------------------
> | segment headers |
> -------------------
> | .text           |
> -------------------
> | other sections  |
> -------------------
> | section headers |
> -------------------
> 
> The segment headers might say "load everything from the start of file to the end of .text section at address 0x40000". So the load address of this file (and the base for all kinds of offsets) would be "0x40000", but that does not correspond to any section (the file address of the .text section would probably be something like 0x40123).
> 
> So, it almost sounds to me like we would need some kind of a module-relative (in addition to section-relative) address. Then, this address would be represented as "module+0", and GetLoadAddress(target) could translate that the same way as it does section-relative addresses.
> 
> Though that may be overkill here. For my purposes it would be sufficient to just have a function which always returns an file address (regardless of whether it points to a section or not), and I can use it to compute offsets (kind of like my original patch did). We could keep the Module.GetBaseAddress returning LLDB_INVALID_ADDRESS for cases when the base address cannot be represented as a section offset.