[llvm-commits] PATCH: Fix ELFObjectFile::getSymbolAddress which make llvm-nm work incorrectly on executables

Alexey Samsonov samsonov at google.com
Thu Sep 6 07:17:09 PDT 2012


Hi!

On Mon, Jun 25, 2012 at 11:28 PM, Danil Malyshev <dmalyshev at accesssoftek.com
> wrote:

> **
>
> Hi,****
>
> ** **
>
> MCJIT uses only getFileOffset() and only for the relocatable file, so this
> patch does not affect to MCJIT.****
>
> Nevertheless, it's looks like the getFileOffset(), and the getAddress()
> contain errors and it's great that someone want to fix them.****
>
> ** **
>
> My vision of an ideal situation:****
>
> The getAddress() returns the address of the symbol for those file types,
> where it makes sense (such as an executable file). In other cases the
> result may be undefined.
>

As I understand it, symbol table (at least the one in ELF) contains a
value, which means different things for different symbols / object types,
so it's worth
writing a non-trivial getAddress() function, that in case of ELF would
return (a) symbol value for execs/shared libraries and (b) symbol value +
section address
for relocatable files. IIUC some libObject clients (like llvm-nm or
llvm-objdump) depend on that function.

I've fixed this function on ELF, and renamed it throughout codebase to
getVirtualAddress() to better reflect its meaning. Could you please take a
look at the patch?

Code review: http://codereview.appspot.com/6503082/

****
>
> The getFileOffset() returns the offset of the symbol from the beginning of
> the file. For an executable file its can be calculated as something like:
> symbol_address - section_address + section_offset.****
>
> ** **
>
> ** **
>
> > How can I easily distinguish between relocatable files and executables?*
> ***
>
> ** **
>
> Header->e_type?
>

Ghm. Indeed :)


> ****
>
> ** **
>
> ** **
>
> Regards,****
>
> Danil****
>
> ** **
>  ------------------------------
>
> *From:* Alexey Samsonov [mailto:samsonov at google.com]
> *Sent:* Saturday, June 23, 2012 12:10 PM
> *To:* Michael Spencer
> *Cc:* **llvm-commits at cs.uiuc.edu**; Dmitry Vyukov; eli.bendersky at intel.com;
> Danil Malyshev; Owen Anderson
> *Subject:* Re: PATCH: Fix ELFObjectFile::getSymbolAddress which make
> llvm-nm work incorrectly on executables****
>
> ** **
>
> ** **
>
> On Fri, Jun 22, 2012 at 11:49 PM, Michael Spencer <bigcheesegs at gmail.com>
> wrote:****
>
> On Fri, Jun 22, 2012 at 3:11 AM, Alexey Samsonov <samsonov at google.com>
> wrote:
> > Hi!
> >
> > libObject seems to incorrectly implement
> > ELFObjectFile::getSymbolAddress. See this reproducer:
> > $ cat main.cc
> > int main() {
> >   return 0;
> > }
> > $ g++ main.cc -o main.out
> > $ nm main.out | grep main
> >                  U __libc_start_main@@GLIBC_2.2.5
> > 00000000004004b4 T main
> > $ llvm-nm main.out | grep main
> >          U __libc_start_main@@GLIBC_2.2.5
> > 00800884 T main
> >
> > Let's try to get what's wrong:
> > 800884 - 4004b4 = 4003d0
> > $ objdump -h main.out | grep .text
> >  11 .text         000001c8  00000000004003d0  00000000004003d0  000003d0
> >  2**4
> >
> > So, the symbol address is incorrectly incremented by the section offset.
> To
> > my understanding, attached patch should be applied to fix this. Please
> check
> > if this is ok to apply.
> > getSymbolFileOffset in the same file seems to be fine, at least
> according to
> > this quote from ELF specs:
> >
> > Symbol table entries for different object file types have slightly
> different
> > interpretations for the st_value member.
> > <...>
> > * In relocatable files, st_value holds a section offset for a defined
> > symbol. That is, st_value is an offset from the beginning of the section
> > that st_shndx identifies.
> > * In executable and shared object files, st_value holds a virtual
> address.
> > [...]
> >
> > --
> > Alexey Samsonov, MSK
> >****
>
> I agree that llvm-nm is incorrect here, but I'm not sure this is the
> correct fix. The issue is that exactly what getSymbolAddress is
> supposed to return is undocumented. There was quite a bit of
> discussion about it in "[llvm-commits] MachOObjectFile fix functions",
> but even after reading it I'm not 100% sure what it should do.  This
> patch also doesn't seem to handle the difference between a relocatable
> file and an executable.****
>
> ** **
>
> True. How can I easily distinguish between relocatable files and
> executables?****
>
> Is it a bad idea to provide two different methods for different types of
> files?****
>
>  ****
>
> I've CCed the people from the above thread. I would like to decide on
> a well defined meaning for all of the Address/Offset functions and
> document that in the code before we change anything, as I believe the
> ELF MCJIT is relying on the current behavior.****
>
>  ** **
>
> Yes, I would really like the behavior to be documented, as it's a bit
> confusing****
>
> that system nm and "objdump -t" provide different results than "llvm-nm"**
> **
>
> and "llvm-objdump -t". ****
>
> ** **
>
> What I was actually trying to achieve is to****
>
> to symbolize a given instruction address - get the name of function that
> contains****
>
> this instruction. I thought that the easy and straightforward way to do
> this is to****
>
> use libLLVMObject, iterate over all symbols from symbol table in
> executable, get symbol name and size****
>
> and do a simple check. Well, it doesn't work this way :)****
>
> ** **
>
> --****
>
> Alexey Samsonov, MSK****
>



-- 
Alexey Samsonov, MSK
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120906/bdd7d501/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: issue6503082_2001.diff
Type: application/octet-stream
Size: 14542 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120906/bdd7d501/attachment.obj>


More information about the llvm-commits mailing list