[lldb-dev] File offset usage on ObjectFileELF

Greg Clayton gclayton at apple.com
Thu May 16 13:15:32 PDT 2013

This is fallout from changes over the years.

If indeed ObjectFileELF creates its own data that is zero based, then the file offset should be ignored in other areas. It used to be each object file would lazily parse data from the file, but that led to crashes when someone would modify the file during a debug session. We would go to a file offset and read bytes that didn't make sense based off of the data we parsed when the first opened the file (like program and section headers). 

To fix this correctly, we now always mmap the entire object file when it first gets parsed so that we don't ever run into this case. During these changes, I switched all object file parsers over to always make their file data correct and zero based (go to file_offset, and read entire file up front).

More answers below.
On May 16, 2013, at 11:59 AM, Filipe Cabecinhas <filcab at gmail.com> wrote:

> Hi all,
> Why is ObjectFileELF using the file offset to parse the ELF headers?
It shouldn't if the m_data member is filled with the ELF file data starting at file_offset to begin with.

> For the changes I'm suggesting, skip to the end of the message (3 last
> paragraphs).
> From what I can tell, ObjectFileELF's constructor will get a
> FileSpec+file_offset and a DataBuffer+data_offset. CreateInstance made
> sure the DataBuffer had all the data we needed.

Indeed, changes which happened slowly over the years as LLDB evolved.
> Then ObjectFileELF's constructor calls ObjectFile's constructor with
> those arguments too. ObjectFile's constructor sets its private m_data
> field to point to the argument DataBuffer's contents (+ offset), and
> simply stores the FileSpec and file_offset.
> When ObjectFileELF::ParseHeader() is called, it gets the file_offset
> and then parses the header from m_data with that offset. This makes no
> sense, for me, since the offset is only for the file, not the
> DataBuffer.

It probably shouldn't. All ELF files start at file offset zero currently as we have no clients that are parsing ELF files from within .a files (BSD archives). I am sure things will fail badly when and if we put ELF files into .a files.

> I couldn't find any other reference to ObjectFile::GetFileOffset on
> either ObjectFile, ObjectFileELF, or ObjectFileMachO. There is one
> reference to m_file_offset in ObjectFileMachO, though. Which seems
> legit, but is weird since CreateInstance makes sure that the
> DataBuffer has all the file, before calling the constructor.
> I'm proposing changing ObjectFileELF::ParseHeader() to just create a
> variable that is 0 as the offset and use that as the offset for the
> ELFHeader::Parse() method.

That is correct as long as m_data is filled in to have the ELF data start at offset zero which I currently believe it is.

> Either ObjectFileMachO calls the GetFileOffset() or it's not doing
> anything and we can delete it. Otherwise bug can appear out of nowhere
> if anything is changed in one of those.

> Is it possible that ObjectFileMachO actually needs to read the file
> again? When can this happen? (ObjectFileMachO's constructor is not
> private, but I think it's only called from CreateInstance)

GetFileOffset() needs to stay for universal files and for BSD archives. It is mainly needed right now for comm page binaries (see DynamicLoaderMacOSXDYLD::AddModulesUsingImageInfos()). These binaries are mach-o files that are embedded inside a mach-o file section. So please do not remove it.

> Should I prepare a patch, or did I miss something?

The flexibility is currently needed for the universal mach-o files and BSD archives. Even worse you can have a universal BSD archive, or a BSD archive full of universal mach-o files. The m_file_offset is useless for ELF as it is always zero, unless you have a BSD archive full of ELF files.
> Thank you,
>  Filipe
> P.S: I found this while trying to make a reader for a format based on
> ELF which has an extra header. I just bumped the offsets by that
> header's size, on CreateInstance, and created an ObjectFileELF, that's
> why I bumped into this bug.

Gotcha. Please do fix the ELF parser to not take the file offset into account more than once.


> _______________________________________________
> lldb-dev mailing list
> lldb-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/lldb-dev

More information about the lldb-dev mailing list