[llvm-commits] PATCH: Fix ELFObjectFile::getSymbolAddress which make llvm-nm work incorrectly on executables

Alexey Samsonov samsonov at google.com
Sat Jun 23 01:09:52 PDT 2012


On Fri, Jun 22, 2012 at 11:49 PM, Michael Spencer <bigcheesegs at gmail.com>wrote:

> On Fri, Jun 22, 2012 at 3:11 AM, Alexey Samsonov <samsonov at google.com>
> wrote:
> > Hi!
> >
> > libObject seems to incorrectly implement
> > ELFObjectFile::getSymbolAddress. See this reproducer:
> > $ cat main.cc
> > int main() {
> >   return 0;
> > }
> > $ g++ main.cc -o main.out
> > $ nm main.out | grep main
> >                  U __libc_start_main@@GLIBC_2.2.5
> > 00000000004004b4 T main
> > $ llvm-nm main.out | grep main
> >          U __libc_start_main@@GLIBC_2.2.5
> > 00800884 T main
> >
> > Let's try to get what's wrong:
> > 800884 - 4004b4 = 4003d0
> > $ objdump -h main.out | grep .text
> >  11 .text         000001c8  00000000004003d0  00000000004003d0  000003d0
> >  2**4
> >
> > So, the symbol address is incorrectly incremented by the section offset.
> To
> > my understanding, attached patch should be applied to fix this. Please
> check
> > if this is ok to apply.
> > getSymbolFileOffset in the same file seems to be fine, at least
> according to
> > this quote from ELF specs:
> >
> > Symbol table entries for different object file types have slightly
> different
> > interpretations for the st_value member.
> > <...>
> > * In relocatable files, st_value holds a section offset for a defined
> > symbol. That is, st_value is an offset from the beginning of the section
> > that st_shndx identifies.
> > * In executable and shared object files, st_value holds a virtual
> address.
> > [...]
> >
> > --
> > Alexey Samsonov, MSK
> >
>
> I agree that llvm-nm is incorrect here, but I'm not sure this is the
> correct fix. The issue is that exactly what getSymbolAddress is
> supposed to return is undocumented. There was quite a bit of
> discussion about it in "[llvm-commits] MachOObjectFile fix functions",
> but even after reading it I'm not 100% sure what it should do.  This
> patch also doesn't seem to handle the difference between a relocatable
> file and an executable.
>

True. How can I easily distinguish between relocatable files and
executables?
Is it a bad idea to provide two different methods for different types of
files?


> I've CCed the people from the above thread. I would like to decide on
> a well defined meaning for all of the Address/Offset functions and
> document that in the code before we change anything, as I believe the
> ELF MCJIT is relying on the current behavior.


Yes, I would really like the behavior to be documented, as it's a bit
confusing
that system nm and "objdump -t" provide different results than "llvm-nm"
and "llvm-objdump -t".

What I was actually trying to achieve is to
to symbolize a given instruction address - get the name of function that
contains
this instruction. I thought that the easy and straightforward way to do
this is to
use libLLVMObject, iterate over all symbols from symbol table in
executable, get symbol name and size
and do a simple check. Well, it doesn't work this way :)

--
Alexey Samsonov, MSK
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120623/c35af34f/attachment.html>


More information about the llvm-commits mailing list