[LLVMdev] ELFObjectFile changes, llvm-objdump showing 'wrong' values?

Will Dietz willdtz at gmail.com
Sun Jan 22 23:17:14 PST 2012


2012/1/23 Bendersky, Eli <eli.bendersky at intel.com>:
> Hi,
>
> I would like to examine the implications you mention in more detail.
>

Thank you!

> (1) Symbol address
> According to the ELF standard, in a symbol table entry st_value means: "In relocatable files, st_value holds a section offset for a defined symbol. That is,
> st_value is an offset from the beginning of the section that st_shndx identifies." (*)
>
> Therefore, when queried about a symbol's address what would the right answer be? In ELFObjectFile::getSymbolAddress, previously, it was simply symb->st_value (which is the relative offset to the section). Now, Section->sh_addr is added to reflect the actual address of the symbol.
>
> Ignoring for the moment the change this imposes on objdump & nm (which can be amended), what would the expected address be for clients of getSymbolAddress?

I trust your interpretation and implementation of the relevant spec's,
and don't mean to suggest a mistake there.  I apologize if I did so
previously.

What I do know is that now ELFObjectFile doesn't seem to work on
executables, as it did before.  Accordingly the tools that use
ELFObjectFile (llvm-objdump, llvm-nm) no longer accurately display
symbol information on such files (and my project, using code from
these tools, doesn't either).  Since these tools used to do this
"correctly", as do their non-llvm counterparts, and because they made
use of ELFObjectFile for this purpose, I assumed that was a supported
use case.  It appears that's incorrect, and the output working for
executables was always a coincidence.  I wish this wasn't the case,
but I understand things change and will update my project accordingly
(or move away from MC if that's not possible, I suppose). I assume
there's no somewhat-equivalent class/etc that will enable a client to
reason about non-relocatable ELF files now that ELFObjectFile doesn't
support them?

>
> (2) Symbol offset
> Again, referring to the definition of the "st_value" field above, the file offset of the symbol is the section offset plus the symbol's offset in the section, which is reflected in the new code:
>
>    Result = symb->st_value +
>             (Section ? Section->sh_offset : 0);
>
> The old code subtracted Section->sh_addr from that for reasons that are not entirely clear to me.
>
> I'm not sure where this creates a problem for you? AFAICS, neither llvm-objdump nor llvm-nm use the symbol's file offset. It's also not clear from your pastes of llvm-objdump and objdump what the significant difference are.
>

The difference in the pastes, and my apologies for not explicitly
pointing this out originally, is that the symbol addresses (see
'main') now seem to double-include the section address in their value.
 Notice how llvm-objdump gives address of 00800850 for main while
objdump shows 004004a0.  Note that before your changes llvm-objdump's
output was aligned with that of normal objdump in this regard.

> Eli
>
> (*) ELFObjectFile represents a relocatable file
>

It appears 100% of the/my problem is thinking ELFObjectFile was
suitable for use on non-relocatable files such as executables.  Since
this appears to be wrong (it gives the wrong results for such files as
detailed above, and probably others), and because this is by design
not mistake, might I suggest something similar to updating
Binary::createBinary (in lib/Object/Binary.cpp) to reflect this to
avoid future confusion (as it presently uses ELFObjectFile for all ELF
file types, not just relocatables).  I don't know how the correct
person to bug about this, hopefully addressing llvmdev@ is sufficient
here.

Thank you for your time Eli, your detailed explanation, and your
continued work.  Have a good one :)

~Will




More information about the llvm-dev mailing list