[llvm-commits] PATCH: Fix ELFObjectFile::getSymbolAddress which make llvm-nm work incorrectly on executables

Alexey Samsonov samsonov at google.com
Mon Sep 10 23:13:55 PDT 2012


On Thu, Sep 6, 2012 at 6:17 PM, Alexey Samsonov <samsonov at google.com> wrote:

> Hi!
>
> On Mon, Jun 25, 2012 at 11:28 PM, Danil Malyshev <
> dmalyshev at accesssoftek.com> wrote:
>
>> **
>>
>> Hi,****
>>
>> ** **
>>
>> MCJIT uses only getFileOffset() and only for the relocatable file, so
>> this patch does not affect to MCJIT.****
>>
>> Nevertheless, it's looks like the getFileOffset(), and the getAddress()
>> contain errors and it's great that someone want to fix them.****
>>
>> ** **
>>
>> My vision of an ideal situation:****
>>
>> The getAddress() returns the address of the symbol for those file types,
>> where it makes sense (such as an executable file). In other cases the
>> result may be undefined.
>>
>
> As I understand it, symbol table (at least the one in ELF) contains a
> value, which means different things for different symbols / object types,
> so it's worth
> writing a non-trivial getAddress() function, that in case of ELF would
> return (a) symbol value for execs/shared libraries and (b) symbol value +
> section address
> for relocatable files. IIUC some libObject clients (like llvm-nm or
> llvm-objdump) depend on that function.
>
> I've fixed this function on ELF, and renamed it throughout codebase to
> getVirtualAddress() to better reflect its meaning. Could you please take a
> look at the patch?
>

ping


>
> Code review: http://codereview.appspot.com/6503082/
>
> ****
>>
>> The getFileOffset() returns the offset of the symbol from the beginning
>> of the file. For an executable file its can be calculated as something
>> like: symbol_address - section_address + section_offset.****
>>
>> ** **
>>
>> ** **
>>
>> > How can I easily distinguish between relocatable files and executables?
>> ****
>>
>> ** **
>>
>> Header->e_type?
>>
>
> Ghm. Indeed :)
>
>
>> ****
>>
>> ** **
>>
>> ** **
>>
>> Regards,****
>>
>> Danil****
>>
>> ** **
>>  ------------------------------
>>
>> *From:* Alexey Samsonov [mailto:samsonov at google.com]
>> *Sent:* Saturday, June 23, 2012 12:10 PM
>> *To:* Michael Spencer
>> *Cc:* **llvm-commits at cs.uiuc.edu**; Dmitry Vyukov;
>> eli.bendersky at intel.com; Danil Malyshev; Owen Anderson
>> *Subject:* Re: PATCH: Fix ELFObjectFile::getSymbolAddress which make
>> llvm-nm work incorrectly on executables****
>>
>> ** **
>>
>> ** **
>>
>> On Fri, Jun 22, 2012 at 11:49 PM, Michael Spencer <bigcheesegs at gmail.com>
>> wrote:****
>>
>> On Fri, Jun 22, 2012 at 3:11 AM, Alexey Samsonov <samsonov at google.com>
>> wrote:
>> > Hi!
>> >
>> > libObject seems to incorrectly implement
>> > ELFObjectFile::getSymbolAddress. See this reproducer:
>> > $ cat main.cc
>> > int main() {
>> >   return 0;
>> > }
>> > $ g++ main.cc -o main.out
>> > $ nm main.out | grep main
>> >                  U __libc_start_main@@GLIBC_2.2.5
>> > 00000000004004b4 T main
>> > $ llvm-nm main.out | grep main
>> >          U __libc_start_main@@GLIBC_2.2.5
>> > 00800884 T main
>> >
>> > Let's try to get what's wrong:
>> > 800884 - 4004b4 = 4003d0
>> > $ objdump -h main.out | grep .text
>> >  11 .text         000001c8  00000000004003d0  00000000004003d0  000003d0
>> >  2**4
>> >
>> > So, the symbol address is incorrectly incremented by the section
>> offset. To
>> > my understanding, attached patch should be applied to fix this. Please
>> check
>> > if this is ok to apply.
>> > getSymbolFileOffset in the same file seems to be fine, at least
>> according to
>> > this quote from ELF specs:
>> >
>> > Symbol table entries for different object file types have slightly
>> different
>> > interpretations for the st_value member.
>> > <...>
>> > * In relocatable files, st_value holds a section offset for a defined
>> > symbol. That is, st_value is an offset from the beginning of the section
>> > that st_shndx identifies.
>> > * In executable and shared object files, st_value holds a virtual
>> address.
>> > [...]
>> >
>> > --
>> > Alexey Samsonov, MSK
>> >****
>>
>> I agree that llvm-nm is incorrect here, but I'm not sure this is the
>> correct fix. The issue is that exactly what getSymbolAddress is
>> supposed to return is undocumented. There was quite a bit of
>> discussion about it in "[llvm-commits] MachOObjectFile fix functions",
>> but even after reading it I'm not 100% sure what it should do.  This
>> patch also doesn't seem to handle the difference between a relocatable
>> file and an executable.****
>>
>> ** **
>>
>> True. How can I easily distinguish between relocatable files and
>> executables?****
>>
>> Is it a bad idea to provide two different methods for different types of
>> files?****
>>
>>  ****
>>
>> I've CCed the people from the above thread. I would like to decide on
>> a well defined meaning for all of the Address/Offset functions and
>> document that in the code before we change anything, as I believe the
>> ELF MCJIT is relying on the current behavior.****
>>
>>  ** **
>>
>> Yes, I would really like the behavior to be documented, as it's a bit
>> confusing****
>>
>> that system nm and "objdump -t" provide different results than "llvm-nm"*
>> ***
>>
>> and "llvm-objdump -t". ****
>>
>> ** **
>>
>> What I was actually trying to achieve is to****
>>
>> to symbolize a given instruction address - get the name of function that
>> contains****
>>
>> this instruction. I thought that the easy and straightforward way to do
>> this is to****
>>
>> use libLLVMObject, iterate over all symbols from symbol table in
>> executable, get symbol name and size****
>>
>> and do a simple check. Well, it doesn't work this way :)****
>>
>> ** **
>>
>> --****
>>
>> Alexey Samsonov, MSK****
>>
>
>
>
> --
> Alexey Samsonov, MSK
>
>


-- 
Alexey Samsonov, MSK
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120911/f02acc4a/attachment.html>


More information about the llvm-commits mailing list