[Lldb-commits] [PATCH] D55356: Add a method to get the "base" file address of an object file

Tue Dec 11 11:31:40 PST 2018

On 11/12/2018 20:10, Jim Ingham via Phabricator wrote:
> jingham added a comment.
> 
> In D55356#1327280 <https://reviews.llvm.org/D55356#1327280>, @clayborg wrote:
> 
>> In D55356#1327242 <https://reviews.llvm.org/D55356#1327242>, @labath wrote:
>>
>>> In D55356#1327224 <https://reviews.llvm.org/D55356#1327224>, @clayborg wrote:
>>>
>>>> In D55356#1327099 <https://reviews.llvm.org/D55356#1327099>, @labath wrote:
>>>>
>>>>> Actually, this now causes an lldb-mi test to fail, but it's not clear to me if the problem is in the test, or this patch. This issue happens when lldb-mi is printing the "library loaded" message after a module gets added to a not-yet-running target. It tries to print the load address by first getting the base address and then converting that to a load address.
>>>>>
>>>>> Before this patch, that would always fail, because well.. ELF and PECOFF had this function unimplemented, and for MachO the base address was section-relative, and so it wasn't resolved to a load address without the section being loaded. However, with this patch, in the ELF (and presumably PECOFF) case, the load address is not section-relative and so the `GetLoadAddress` function happily returns the address.
>>>>>
>>>>> Is this the expected behavior here? (i.e., object_file->GetLoadAddress().GetLoadAddress(target) returning a valid value even though the target is not running)
>>>>
>>>>
>>>> Not unless someone has manually set the section load address in the test?
>>>
>>>
>>> The test is not setting the load address manually. This simply falls out of how `Address::GetLoadAddress`  is implemented:
>>>
>>>    addr_t Address::GetLoadAddress(Target *target) const {
>>>      SectionSP section_sp(GetSection());
>>>      if (section_sp) {
>>>        ...
>>>      } else {
>>>        // We don't have a section so the offset is the load address
>>>        return m_offset;
>>>      }
>>>    }
>>>
>>>
>>> So, where's the bug here? It's not clear to me how to change `Address::GetLoadAddress` to do something different than what it is doing now.
>>
>>
>> So this is a bug really. When there is no section, we should specify what the m_offset is using lldb_private::AddressType in maybe a new ivar name "m_offset_type". Then GetBaseAddress() would specify eAddressTypeFile. And the above code would become:
>>
>>   
>>
>> addr_t Address::GetLoadAddress(Target *target) const {
>>
>>    SectionSP section_sp(GetSection());
>>    if (section_sp) {
>>      ...
>>    } else if (m_offset_type == eAddressTypeLoad) {
>>      // We don't have a section so the offset is the load address
>>      return m_offset;
>>    }
>>
>> }
>>
>>    We just need to be careful and see if we can not make lldb_private::Address get any bigger byte size wise if we can at all avoid it.
> 
> 
> I must be missing something.  If there's a file around so that we can express this address relative to the file, why would it ever not be expressed as a section + offset?  If there's not a file around, then what does it mean to say this address ie eAddressTypeFile but we don't know the file?
> 
> 

I think the issue here is the difference in how elf and MachO files are 
loaded. Elf has this strange dual view of the file, where the same data 
is represented both as "sections", and "segments". Sections are the 
thing we all know (.text, .data, .debug_info, etc.), and are used by 
most tools (lldb, linker, ...). However, this is *not* what the kernel 
uses when it loads a binary into memory. The loader uses "segments" instead.

It is segments who describe where will a piece of file be loaded into 
memory. Normally, segments span one or more sections, but this is not a 
requirement. And almost always segments will contain some additional 
data besides section content. This data is generally "junk" but it is 
there because it enables the linker to "pack" the elf file more efficiently.

A typical elf file might look something like
-------------------
| elf header      |
-------------------
| segment headers |
-------------------
| .text           |
-------------------
| other sections  |
-------------------
| section headers |
-------------------

The segment headers might say "load everything from the start of file to 
the end of .text section at address 0x40000". So the load address of 
this file (and the base for all kinds of offsets) would be "0x40000", 
but that does not correspond to any section (the file address of the 
.text section would probably be something like 0x40123).

So, it almost sounds to me like we would need some kind of a 
module-relative (in addition to section-relative) address. Then, this 
address would be represented as "module+0", and GetLoadAddress(target) 
could translate that the same way as it does section-relative addresses.

Though that may be overkill here. For my purposes it would be sufficient 
to just have a function which always returns an file address (regardless 
of whether it points to a section or not), and I can use it to compute 
offsets (kind of like my original patch did). We could keep the 
Module.GetBaseAddress returning LLDB_INVALID_ADDRESS for cases when the 
base address cannot be represented as a section offset.