[llvm-commits] LLD PATCH: add ELF reader basic skeleton code
Sidney Manning
sidneym at codeaurora.org
Thu Jul 5 11:59:59 PDT 2012
> -------- Original Message --------
> Subject: Re: [llvm-commits] LLD PATCH: add ELF reader basic skeleton
> code
> Date: Wed, 13 Jun 2012 22:45:45 -0700
> From: Michael Spencer <bigcheesegs at gmail.com>
> To: Hemant Kulkarni <khemant at codeaurora.org>, Nick Kledzik
> <kledzik at apple.com>
> CC: llvm-commits at cs.uiuc.edu
>
> On Wed, Jun 13, 2012 at 5:07 PM, Hemant Kulkarni
> <khemant at codeaurora.org> wrote:
> > Hello Everyone ,
> >
> >
> >
> > I am working to get ELF port in LLD linker. I am attaching a patch
> > that fills in basic sub-classes in the ReaderELF. I request feedback
> > for this code. I also want suggestions for ways to test this Reader
> > as I notice there is no way right now to test COFF reader as well.
> >
> >
> >
> > The ReaderELF will read an object file and instantiate an
> > ELFObjectFile object and fill in the atom subclasses that correspond
> for ELF.
> >
> >
> >
> >
> >
> > --
> >
> > Hemant Kulkarni
> >
> > khemant at codeaurora.org
> >
> > Qualcomm Innovation Center, Inc is a member of Code Aurora Forum
>
> The major issue I see is that the generic object API is too high level.
> You should be using the ELF specific API that gives you access to the
> raw symbol and section structures as the COFF reader does. You should
> also take a look at the llvm style guide.
I've been going through this file and I'd like to clarify the comments
regarding use of the lower-level ELF api. The FileElf method operates on
the object file given using the ELFObjectFile class.
It would be helpful for me if you could highlight a couple of examples of
where we are off base.
>
> As for testing we need a yaml2obj for ELF. Although I'm fine with
> testing with binary files temporarily.
>
> Additional comments inline.
>
> diff --git a/lib/ReaderWriter/ELF/ReaderELF.cpp
> b/lib/ReaderWriter/ELF/ReaderELF.cpp
> index ba23990..f4bd52d 100644
> --- a/lib/ReaderWriter/ELF/ReaderELF.cpp
> +++ b/lib/ReaderWriter/ELF/ReaderELF.cpp
> @@ -1,41 +1,456 @@
> -//===- lib/ReaderWriter/ELF/ReaderELF.cpp
> --------------------------------===//
> +//===- lib/ReaderWriter/ELF/ReaderELF.cpp
> ---------------------------------===//
> //
> // The LLVM Linker
> //
> // This file is distributed under the University of Illinois Open
> Source
> // License. See LICENSE.TXT for details.
> //
> -//===-----------------------------------------------------------------
> -----===//
> +//===-----------------------------------------------------------------
> -
> +---===//
> +//
> +// This file contains the ELF Reader and all helper sub classes // to
> +consume an ELF file and produces atoms out of it.
> +//
> +//===-----------------------------------------------------------------
> -
> +---===//
>
> #include "lld/ReaderWriter/ReaderELF.h"
> #include "lld/Core/File.h"
>
> +#include "llvm/ADT/ArrayRef.h"
> +#include "llvm/ADT/StringRef.h"
> +#include "llvm/Support/Allocator.h"
> #include "llvm/Support/Casting.h"
> #include "llvm/Support/ErrorHandling.h"
> #include "llvm/Support/Memory.h"
> #include "llvm/Support/MemoryBuffer.h"
> #include "llvm/Support/raw_ostream.h"
> #include "llvm/Support/system_error.h"
> +#include "llvm/Support/Endian.h"
> +#include "llvm/Support/MathExtras.h"
> +#include "llvm/Object/ObjectFile.h"
> +#include "llvm/Object/ELF.h"
> +#include "llvm/Support/ELF.h"
>
> #include <map>
> #include <vector>
>
> +using namespace lld;
>
> -namespace lld {
> +// This atom class corresponds to absolute symbol
>
> -ReaderOptionsELF::ReaderOptionsELF() {
> -}
> +class ELFAbsoluteAtom: public AbsoluteAtom {
> +public:
> + ELFAbsoluteAtom(const File &F, llvm::StringRef N, uint64_t V) :
> + OwningFile(F), Name(N), Value(V) {
> + }
>
> -ReaderOptionsELF::~ReaderOptionsELF() { -}
> + virtual const class File& file() const {
> + return OwningFile;
> + }
>
> + virtual llvm::StringRef name() const {
> + return Name;
> + }
>
> + virtual uint64_t value() const {
> + return Value;
> + }
>
> -Reader* createReaderELF(const ReaderOptionsELF &options) {
> - assert(0 && "ELF Reader not yet implemented");
> - return nullptr;
> -}
> +private:
> + const File &OwningFile;
> + llvm::StringRef Name;
> + uint64_t Value;
> +};
> +
> +// This atom corresponds to undefined symbols.
> +class ELFUndefinedAtom: public UndefinedAtom {
> +public:
> + ELFUndefinedAtom(const File &F, llvm::StringRef N,
> + llvm::object::symbol_iterator Symb) :
> + OwningFile(F), Name(N), Symbol(Symb) {
> + }
> +
> + virtual const class File& file() const {
> + return OwningFile;
> + }
> +
> + virtual llvm::StringRef name() const {
> + return Name;
> + }
> +
> + // FIXME What distinguishes a symbol in ELF that can help
> + // decide if the symbol is undefined only during build and
> not
> + // runtime? This will make us choose canBeNullAtBuildtime and
> + // canBeNullAtRuntime
> + //
> + virtual CanBeNull canBeNull() const {
> + uint32_t Result;
> +
> + Symbol->getFlags(Result);
> + if (Result == llvm::object::SymbolRef::SF_Weak)
> + return CanBeNull::canBeNullAtBuildtime;
> + else
> + return CanBeNull::canBeNullNever;
> + }
> +
> +private:
> + const File &OwningFile;
> + llvm::StringRef Name;
> + llvm::object::symbol_iterator Symbol;
> +};
> +
> +template<llvm::support::endianness target_endianness, bool is64Bits>
> +class ELFDefinedAtom: public DefinedAtom {
> +public:
> + ELFDefinedAtom(const File &F, llvm::StringRef N,
> + llvm::object::symbol_iterator Symb,
> + llvm::object::section_iterator Sec,
> llvm::ArrayRef<uint8_t> D) :
>
> This should be using the Object/ELF.h interface instead of the generic
> interface. This would solve a lot of the ambiguity issues below.
Can you give some more specific information about what needs to change?
Should we try to use Elf_Sym_Base and Elf_Shdr_Base instead of the
symbol/section iterators.
>
> + OwningFile(F), Name(N), Symbol(Symb), Section(Sec), Data(D)
> {
> + static uint64_t ordernumber = 0;
> + _ordinal = ++ordernumber;
> + }
> +
> + virtual const class File& file() const {
> + return OwningFile;
> + }
> +
> + virtual llvm::StringRef name() const {
> + return Name;
> + }
> +
> + virtual uint64_t ordinal() const {
> + return _ordinal;
> + }
> +
> + virtual uint64_t size() const {
> + uint64_t size;
> + Symbol->getSize(size);
>
> This might not always be true. We need to make sure that all the data
> in a section is covered by an atom.
>
> + return size;
> + }
> +
> + // FIXME We also need to find visibility
> + // This will help us determine scopeLinkageUnit
> + // we also need to incorporate other visibilities
> +
> +
> + virtual Scope scope() const {
> + uint32_t Result;
> + Symbol->getFlags(Result);
> + if (Result == llvm::object::SymbolRef::SF_Global)
> + return scopeGlobal;
> + else
> + return scopeTranslationUnit;
> + }
> +
> + // FIXME need to revisit this in future
> +
> + virtual Interposable interposable() const {
> + return interposeNo;
> + }
> +
> + // FIXME What ways can we determine this in ELF?
> +
> + virtual Merge merge() const {
> + return mergeNo;
> + }
> +
> + virtual ContentType contentType() const {
> + ContentType type;
> + bool Result = false;
> + llvm::error_code ec;
> + ec = Section->isText(Result);
> + if (!ec && Result)
> + return typeCode;
> +
> + ec = Section->isData(Result);
> + if (!ec && Result)
> + return typeData;
> +
> +// The ZeroFill type is a bit ambiguous in case of ELF
> +// Should this mean it occupies no space in the disk image?
> +// A .bss is such section
>
> This would be easier to figure out using the Object/ELF.h interface.
>
> +
> +
> + ec = Section->isBSS(Result);
> + if (!ec && Result)
> + return typeZeroFill;
> +
> + return typeUnknown;
> + }
> +
> + virtual Alignment alignment() const {
> + uint64_t Result;
> + llvm::error_code ec;
> + ec = Section->getAlignment(Result);
> + if (ec)
> + return Alignment(-1);
> + return Alignment(llvm::Log2_64(Result));
> +
> + }
> +
> + virtual SectionChoice sectionChoice() const {
> + return sectionBasedOnContent;
> + }
> +
> + virtual llvm::StringRef customSectionName() const {
> + return "";
> + }
> +
> + virtual DeadStripKind deadStrip() const {
> + return deadStripNormal;
> + }
> +
> + // This does not cover the possibility of the initial
> + // write permissions for PLT stubs
> + // The API available right now is an ugly way of doing it.
> + // We need a way to explicitly
> + // query the SHF_* attributes of a section.
>
> Same issue as above.
>
> +
> + virtual ContentPermissions permissions() const {
> + char Result;
> + Symbol->getNMTypeChar(Result);
> + if (Result == 'r')
> + return permR__;
> + else if (Result == 'd')
> + return permRW_;
> + else if (Result == 't')
> + return permR_X;
> + else
> + return perm___;
> +
> + }
> +
> + // Many non ARM architectures use ELF file format
> + // This not really a place to put a architecture
> + // specific method in an atom. A better approach is
> + // needed.
> + //
>
> Agreed.
>
> + virtual bool isThumb() const {
> + return false;
> + }
> +
> + // FIXME Not Sure if ELF supports alias atoms. Find out more.
> + virtual bool isAlias() const {
> + return false;
> + }
> +
> + virtual llvm::ArrayRef<uint8_t> rawContent() const {
> + return Data;
> + }
>
> + virtual reference_iterator begin() const {
> + return reference_iterator(*this, nullptr);
> + }
>
> -} // namespace
> + virtual reference_iterator end() const {
> + return reference_iterator(*this, nullptr);
> + }
>
> +private:
> + virtual const Reference* derefIterator(const void* iter) const {
> + return nullptr;
> + }
> + virtual void incrementIterator(const void*& iter) const {
> + }
> +
> + const File &OwningFile;
> + llvm::StringRef Name;
> + const llvm::object::symbol_iterator Symbol;
> + const llvm::object::section_iterator Section;
> + llvm::ArrayRef<uint8_t> Data;
> + uint64_t _ordinal;
> +};
> +
> +// FileELF will read a binary, find out based on the symbol table
> contents
> +// what kind of symbol it is and create corresponding atoms for it
> +
> +template<llvm::support::endianness target_endianness, bool is64Bits>
> +class FileELF: public File {
> +public:
> + FileELF(std::unique_ptr<llvm::MemoryBuffer> MB, llvm::error_code
> &EC) :
> + File(MB->getBufferIdentifier()) {
> + llvm::OwningPtr<llvm::object::Binary> Bin;
> + EC = llvm::object::createBinary(MB.release(), Bin);
> + if (EC)
> + return;
> + // Point Obj to correct class and bitwidth ELF
> object
> +
> +
> +
> Obj.reset(llvm::dyn_cast<llvm::object::ELFObjectFile<target_endia
> nness,
> + is64Bits> >(Bin.get()));
> +
> + if (!Obj) {
> + EC =
> make_error_code(llvm::object::object_error::invalid_file_type);
> + return;
> + }
> + Bin.take();
> +
> + // Now we need to iterate over symtab and
> + // get all the symbols
> + // convert these symbols to atoms
> + // FIXME what about dynamic symbols?
>
> I believe the Object/ELF.h interface has a way to do this. It was added
> for RuntimeDyld.
>
> +
> +
> + llvm::object::symbol_iterator si(Obj->begin_symbols());
> + llvm::object::symbol_iterator se(Obj->end_symbols());
> +
> + for (; si != se; si.increment(EC)) {
> + if (EC)
> + llvm::report_fatal_error("Could not read all
> symbols");
>
> Not that your patch should address this, but we need proper
> diagnostics.
>
> + llvm::StringRef Name;
> + llvm::object::SymbolRef::Type type;
> + uint32_t flags = 0;
> +
> + EC = si->getName(Name);
> + if (EC)
> + llvm::report_fatal_error("Could not get
symbol
> name");
> +
> + EC = si->getType(type);
> + if (EC)
> + llvm::report_fatal_error("Could not get
symbol
> type");
> +
> + EC = si->getFlags(flags);
> + if (EC)
> + llvm::report_fatal_error("Could not get
symbol
> flags");
> +
> + if (flags & llvm::object::SymbolRef::SF_Absolute
> + ==
llvm::object::SymbolRef::SF_Absolute)
> {
> + uint64_t AbsValue;
> +
> + EC = si->getAddress(AbsValue);
> + if (EC)
> + llvm::report_fatal_error(
> + "Could not get
absolute
> symbol st_value");
> +
> + AbsoluteAtoms._atoms.push_back(new
> (AtomStorage.Allocate<
> +
> ELFAbsoluteAtom>())ELFAbsoluteAtom(*this, Name,
> + AbsValue));
> + } else if (flags &
> llvm::object::SymbolRef::SF_Undefined
> + ==
llvm::object::SymbolRef::SF_Undefined)
> {
> + UndefinedAtoms._atoms.push_back(new
> (AtomStorage.Allocate<
> + ELFUndefinedAtom>()) ELFUndefinedAtom(*this,
> Name, si));
> +
> + } else {
> + bool Result;
> + //FIXME need to figure out a way to handle
> common type
> + //symbols. For now we abort
> +
> + EC = si->isCommon(Result);
> + assert(!Result && "STT_COMMON type is not
yet
> implemented");
> + //TODO We need to think about how to put
other
> + //symbol types into the defined atom model.
>
> This is a major point. We need a better way to handle different types
> of atoms then just the predefined 4.
>
> +
> +
> + if (type == llvm::object::SymbolRef::ST_Data
||
> type
> + ==
> llvm::object::SymbolRef::ST_Function) {
> +
> + llvm::object::SectionRef SR;
> + llvm::object::section_iterator
> section(SR);
> +
> + EC = si->getSection(section);
> + if (EC)
> + return;
> + //FIXME We need an API to get the
actual
> st_value
> + //for now shorting this array
reference
> to
> + //suppress errors during build time.
> + //Need to fix this
>
> We have this in Object/ELF.h.
>
> +
> + llvm::ArrayRef<uint8_t> val;
> + DefinedAtoms._atoms.push_back(
> + new
> (AtomStorage.Allocate<ELFDefinedAtom<
> + target_endianness,
is64Bits>
> >()) ELFDefinedAtom<
> + target_endianness,
> is64Bits>(*this, Name,
> + si, section, val));
> +
> + }
> + }
> + }
> + }
> +
> + virtual void addAtom(const Atom&) {
> + llvm_unreachable("cannot add atoms to native .o files");
> + }
> +
> + virtual const atom_collection<DefinedAtom> &defined() const {
> + return DefinedAtoms;
> + }
> +
> + virtual const atom_collection<UndefinedAtom> &undefined() const {
> + return UndefinedAtoms;
> + }
> +
> + virtual const atom_collection<SharedLibraryAtom> &sharedLibrary()
> const {
> + return SharedLibraryAtoms;
> + }
> +
> + virtual const atom_collection<AbsoluteAtom> &absolute() const {
> + return AbsoluteAtoms;
> + }
> +
> +private:
> + std::unique_ptr<llvm::object::ELFObjectFile<target_endianness,
> is64Bits> >
> + Obj;
> + atom_collection_vector<DefinedAtom> DefinedAtoms;
> + atom_collection_vector<UndefinedAtom> UndefinedAtoms;
> + atom_collection_vector<SharedLibraryAtom> SharedLibraryAtoms;
> + atom_collection_vector<AbsoluteAtom> AbsoluteAtoms;
> + llvm::BumpPtrAllocator AtomStorage;
> +
> +};
> +
> +// ReaderELF is reader object that will instantiate correct FileELF
> +// by examining the memory buffer for ELF class and bitwidth
> +
> +class ReaderELF: public Reader {
> +public:
> + ReaderELF(const ReaderOptionsELF &options) :
> + _options(options) {
> + }
> + error_code parseFile(std::unique_ptr<MemoryBuffer> mb,
> std::vector<
> + std::unique_ptr<File> > &result) {
> +
> + std::pair<unsigned char, unsigned char> Ident =
> + llvm::object::getElfArchType(&*mb);
> + llvm::error_code ec;
> + // Instantiate the correct FileELF template
> instance
> + // based on the Ident pair. Once the File is
> created
> + // we push the file to the vector of files
> already
> + // created during parser's life.
> +
> + std::unique_ptr<File> f;
> + if (Ident.first == llvm::ELF::ELFCLASS32 && Ident.second
> + == llvm::ELF::ELFDATA2LSB) {
> + f.reset(new FileELF<llvm::support::little, false>
> (std::move(mb),
> + ec));
> + } else if (Ident.first == llvm::ELF::ELFCLASS32 &&
> Ident.second
> + == llvm::ELF::ELFDATA2MSB) {
> + f.reset(new FileELF<llvm::support::big, false>
> (std::move(mb), ec));
> + } else if (Ident.first == llvm::ELF::ELFCLASS64 &&
> Ident.second
> + == llvm::ELF::ELFDATA2MSB) {
> + f.reset(new FileELF<llvm::support::big, true>
> (std::move(mb), ec));
> + } else if (Ident.first == llvm::ELF::ELFCLASS64 &&
> Ident.second
> + == llvm::ELF::ELFDATA2LSB) {
> + f.reset(new FileELF<llvm::support::little,
> true>(std::move(mb),
> + ec));
> + }
> + if (ec)
> + return ec;
> + result.push_back(std::move(f));
> + return error_code::success();
> + }
> +private:
> + const ReaderOptionsELF &_options;
> +};
> +
> +ReaderOptionsELF::ReaderOptionsELF() {
> +}
> +
> +ReaderOptionsELF::~ReaderOptionsELF() { }
> +
> +namespace lld {
> +//This reader is not complete so I wont "enable" the knob
> + Reader* createReaderELF(const ReaderOptionsELF &options) {
> + assert(0 && "ELF Reader not yet implemented");
>
> It's fine to enable it. And we use llvm_unreachable instead of assert
> 0.
>
> + return new ReaderELF(options);
> + }
> +} // namespace lld
>
> Thanks for working on this. I think with using the ELF api and style
> fixes this would be good.
>
> - Michael Spencer
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list