[LLVMdev] ELFObjectFile changes, llvm-objdump showing 'wrong' values?

Bendersky, Eli eli.bendersky at intel.com
Mon Jan 23 00:07:38 PST 2012


> > (1) Symbol address
> > According to the ELF standard, in a symbol table entry st_value means:
> > "In relocatable files, st_value holds a section offset for a defined
> > symbol. That is, st_value is an offset from the beginning of the
> > section that st_shndx identifies." (*)
> >
> > Therefore, when queried about a symbol's address what would the right
> answer be? In ELFObjectFile::getSymbolAddress, previously, it was simply
> symb->st_value (which is the relative offset to the section). Now, Section-
> >sh_addr is added to reflect the actual address of the symbol.
> >
> > Ignoring for the moment the change this imposes on objdump & nm (which
> can be amended), what would the expected address be for clients of
> getSymbolAddress?
> 
> I trust your interpretation and implementation of the relevant spec's, and
> don't mean to suggest a mistake there.  I apologize if I did so previously.
> 
> What I do know is that now ELFObjectFile doesn't seem to work on
> executables, as it did before.  Accordingly the tools that use ELFObjectFile
> (llvm-objdump, llvm-nm) no longer accurately display symbol information on
> such files (and my project, using code from these tools, doesn't either).
> Since these tools used to do this "correctly", as do their non-llvm
> counterparts, and because they made use of ELFObjectFile for this purpose, I
> assumed that was a supported use case.  It appears that's incorrect, and the
> output working for executables was always a coincidence.  I wish this wasn't
> the case, but I understand things change and will update my project
> accordingly (or move away from MC if that's not possible, I suppose). I
> assume there's no somewhat-equivalent class/etc that will enable a client to
> reason about non-relocatable ELF files now that ELFObjectFile doesn't
> support them?
> 

I did not mean to make a sweeping claim that ELFObjectFile doesn't support anything but relocatable files. ELF is flexible enough to allow the same class to support several types of objects, but I don't know if ELFObjectFile actually attempts this. I *assumed* it was meant to mainly support relocatable files, due to the intentions of the libObject library (linker, etc).

In any case, if the intention is to support both relocatable and executable files, then perhaps more sophistication is required. Take st_value, for example. For relocatable files, it's the offset from the section the symbol points to (in st_shndx). For executables (and .so), however, it's just the virtual address. So, for ELFObjectFile::getSymbolAddress to support both, it would probably first need to decide which kind of object it deals with (information ELF makes available in the e_type field of the header).

Looking at it this way, the old code (r148652) assumed executable (since it simply returns st_value for the address), and the new code (r148653) assumes relocatable (since it adds st_value to the section address).

At this point, I would really like to hear more from others at @llvmdev. What would the best approach be? I don't have a problem to change the code moving our new calculation of the address to DyldELFObject where we really need it for the dynamic loading in MC-JIT, but maybe something can be done to accommodate both directions (e.g. going the old way for e_type = ET_EXEC or ET_DYN and the new way for ET_REL?). 


> >
> > (2) Symbol offset
> > Again, referring to the definition of the "st_value" field above, the file
> offset of the symbol is the section offset plus the symbol's offset in the
> section, which is reflected in the new code:
> >
> >    Result = symb->st_value +
> >             (Section ? Section->sh_offset : 0);
> >
> > The old code subtracted Section->sh_addr from that for reasons that are
> not entirely clear to me.
> >
> > I'm not sure where this creates a problem for you? AFAICS, neither llvm-
> objdump nor llvm-nm use the symbol's file offset. It's also not clear from
> your pastes of llvm-objdump and objdump what the significant difference
> are.
> >
> 
> The difference in the pastes, and my apologies for not explicitly pointing this
> out originally, is that the symbol addresses (see
> 'main') now seem to double-include the section address in their value.
>  Notice how llvm-objdump gives address of 00800850 for main while
> objdump shows 004004a0.  Note that before your changes llvm-objdump's
> output was aligned with that of normal objdump in this regard.

Now I see it, thanks. However, I still don't see where llvm-objdump uses the file offset at all. It prints the symbol address in the first column, calling SymbolRef::getAddress, which delegates to ELFObjectFile:: getSymbolAddress. Is the file offset an actual problem for you, or only the address?

Neither can I understand the computation done in the old code to obtain the offset:

  case ELF::STT_FUNC:
  case ELF::STT_OBJECT:
  case ELF::STT_NOTYPE:
    Result = symb->st_value +
             (Section ? Section->sh_offset - Section->sh_addr : 0);
 
Why is the address subtracted?
 
Eli

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.




More information about the llvm-dev mailing list