[llvm-commits] PATCH: Fix ELFObjectFile::getSymbolAddress which make llvm-nm work incorrectly on executables

Danil Malyshev dmalyshev at accesssoftek.com
Mon Jun 25 12:28:07 PDT 2012


Hi,

MCJIT uses only getFileOffset() and only for the relocatable file, so this patch does not affect to MCJIT.
Nevertheless, it's looks like the getFileOffset(), and the getAddress() contain errors and it's great that someone want to fix them.

My vision of an ideal situation:
The getAddress() returns the address of the symbol for those file types, where it makes sense (such as an executable file). In other cases the result may be undefined.
The getFileOffset() returns the offset of the symbol from the beginning of the file. For an executable file its can be calculated as something like: symbol_address - section_address + section_offset.


> How can I easily distinguish between relocatable files and executables?

Header->e_type?


Regards,
Danil

________________________________
From: Alexey Samsonov [mailto:samsonov at google.com]
Sent: Saturday, June 23, 2012 12:10 PM
To: Michael Spencer
Cc: llvm-commits at cs.uiuc.edu; Dmitry Vyukov; eli.bendersky at intel.com; Danil Malyshev; Owen Anderson
Subject: Re: PATCH: Fix ELFObjectFile::getSymbolAddress which make llvm-nm work incorrectly on executables


On Fri, Jun 22, 2012 at 11:49 PM, Michael Spencer <bigcheesegs at gmail.com<mailto:bigcheesegs at gmail.com>> wrote:
On Fri, Jun 22, 2012 at 3:11 AM, Alexey Samsonov <samsonov at google.com<mailto:samsonov at google.com>> wrote:
> Hi!
>
> libObject seems to incorrectly implement
> ELFObjectFile::getSymbolAddress. See this reproducer:
> $ cat main.cc
> int main() {
>   return 0;
> }
> $ g++ main.cc -o main.out
> $ nm main.out | grep main
>                  U __libc_start_main@@GLIBC_2.2.5
> 00000000004004b4 T main
> $ llvm-nm main.out | grep main
>          U __libc_start_main@@GLIBC_2.2.5
> 00800884 T main
>
> Let's try to get what's wrong:
> 800884 - 4004b4 = 4003d0
> $ objdump -h main.out | grep .text
>  11 .text         000001c8  00000000004003d0  00000000004003d0  000003d0
>  2**4
>
> So, the symbol address is incorrectly incremented by the section offset. To
> my understanding, attached patch should be applied to fix this. Please check
> if this is ok to apply.
> getSymbolFileOffset in the same file seems to be fine, at least according to
> this quote from ELF specs:
>
> Symbol table entries for different object file types have slightly different
> interpretations for the st_value member.
> <...>
> * In relocatable files, st_value holds a section offset for a defined
> symbol. That is, st_value is an offset from the beginning of the section
> that st_shndx identifies.
> * In executable and shared object files, st_value holds a virtual address.
> [...]
>
> --
> Alexey Samsonov, MSK
>
I agree that llvm-nm is incorrect here, but I'm not sure this is the
correct fix. The issue is that exactly what getSymbolAddress is
supposed to return is undocumented. There was quite a bit of
discussion about it in "[llvm-commits] MachOObjectFile fix functions",
but even after reading it I'm not 100% sure what it should do.  This
patch also doesn't seem to handle the difference between a relocatable
file and an executable.

True. How can I easily distinguish between relocatable files and executables?
Is it a bad idea to provide two different methods for different types of files?

I've CCed the people from the above thread. I would like to decide on
a well defined meaning for all of the Address/Offset functions and
document that in the code before we change anything, as I believe the
ELF MCJIT is relying on the current behavior.

Yes, I would really like the behavior to be documented, as it's a bit confusing
that system nm and "objdump -t" provide different results than "llvm-nm"
and "llvm-objdump -t".

What I was actually trying to achieve is to
to symbolize a given instruction address - get the name of function that contains
this instruction. I thought that the easy and straightforward way to do this is to
use libLLVMObject, iterate over all symbols from symbol table in executable, get symbol name and size
and do a simple check. Well, it doesn't work this way :)

--
Alexey Samsonov, MSK
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120625/164dedc2/attachment.html>


More information about the llvm-commits mailing list