[llvm-commits] [lld] Support reading archive libraries

Fri Nov 2 14:24:54 PDT 2012

On Thu, Nov 1, 2012 at 12:33 PM, Shankar Easwaran
<shankare at codeaurora.org> wrote:
> Hi,
>
> Please review the patch to add reading archive libraries in lld.
>
> added support in lld core to read archive libraries
> changed llvm lib to support reading GNU style archives
> added a new flag in llvm-nm to display archive index
> added testsuite for llvm-nm
>
> Thanks
>
> Shankar Easwaran
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
> the Linux Foundation

Here's a review of the llvm part.

> Index: include/llvm/Object/Archive.h
> ===================================================================
> --- include/llvm/Object/Archive.h	(revision 166683)
> +++ include/llvm/Object/Archive.h	(working copy)
> @@ -122,9 +122,21 @@
>
>    Archive(MemoryBuffer *source, error_code &ec);
>
> +  enum Kind {
> +      kindGNUArchive,
> +      kindBSDArchive,
> +      kindCOFFArchive
> +  };

Needs to follow llvm naming standard, and Archive is redundant.
http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly

K_GNU,
K_BSD,
K_COFF

> +
> +  Kind kind() const {
> +      return _kind;
> +  }
> +
>    child_iterator begin_children(bool skip_internal = true) const;
>    child_iterator end_children() const;
>
> +  child_iterator begin_archiveidx() const;

This is never defined.

> +
>    symbol_iterator begin_symbols() const;
>    symbol_iterator end_symbols() const;
>
> @@ -136,6 +148,7 @@
>  private:
>    child_iterator SymbolTable;
>    child_iterator StringTable;
> +  Kind _kind;

Follow llvm naming conventions.

>  };
>
>  }
> Index: tools/llvm-nm/llvm-nm.cpp
> ===================================================================
> --- tools/llvm-nm/llvm-nm.cpp	(revision 166683)
> +++ tools/llvm-nm/llvm-nm.cpp	(working copy)
> @@ -113,6 +113,10 @@
>    cl::opt<bool> WithoutAliases("without-aliases", cl::Hidden,
>                                 cl::desc("Exclude aliases from output"));
>
> +  cl::opt<bool> ArchiveIndex("archive-index",
> +    cl::desc("Print the archive index"));

My version of nm names this --print-armap.

> +  cl::alias ArchiveIndexs("s", cl::desc("Alias for --archive-index"),
> +                              cl::aliasopt(ArchiveIndex));
>    bool PrintAddress = true;
>
>    bool MultipleFiles = false;
> @@ -346,6 +350,23 @@
>        return;
>
>      if (object::Archive *a = dyn_cast<object::Archive>(arch.get())) {
> +      if (ArchiveIndex) {
> +        outs() << "archive index" << "\n";

My version of nm prints this with a capital A.

> +        for (object::Archive::symbol_iterator i = a->begin_symbols(), e = a->end_symbols(); i != e; ++i) {

80 col.

> +          object::Archive::child_iterator c;
> +          StringRef symname;
> +          StringRef filename;
> +          if (error(i->getMember(c)))
> +              return;
> +          if (error(i->getName(symname)))
> +              return;
> +          if (error(c->getName(filename)))
> +              return;
> +          outs() << symname << " in " << filename << "\n";
> +        }
> +        outs() << "\n";
> +      }
> +
>        for (object::Archive::child_iterator i = a->begin_children(),
>                                             e = a->end_children(); i != e; ++i) {
>          OwningPtr<Binary> child;
> Index: lib/Object/Archive.cpp
> ===================================================================
> --- lib/Object/Archive.cpp	(revision 166683)
> +++ lib/Object/Archive.cpp	(working copy)
> @@ -122,7 +122,17 @@
>                     + sizeof(ArchiveMemberHeader)
>                     + Parent->StringTable->getSize()))
>        return object_error::parse_failed;
> -    Result = addr;
> +
> +    // If the archive is a GNU archive, then the string table would have
> +    // the name and /, if the filename exists in the string table. it would only
> +    // exist in the string table if the filename is a long filename

I don't understand what this comment is trying to say. Are you refering to all
names in a GNU archive ending with /?

> +    if (Parent->kind() == kindGNUArchive) {
> +      StringRef::size_type end = StringRef(addr).find('/');
> +      Result = StringRef(addr, end);
> +    }
> +    else {

No newline before else.

> +      Result = addr;
> +    }
>      return object_error::success;
>    } else if (name.startswith("#1/")) {
>      APInt name_size;
> @@ -187,15 +197,36 @@
>    child_iterator i = begin_children(false);
>    child_iterator e = end_children();
>
> -  if (i != e) ++i; // Nobody cares about the first member.
> -  if (i != e) {

There should be a comment here explaining that there are different formats and
how we detect the difference.

> +  StringRef name;
> +  if ((ec = i->getName(name)))
> +    return;
> +  if (name == "/") {
> +    // If the archive format is a GNU archive
> +    // the first member is important, it would be the symbol table

This is wrong. The first member in COFF is also /.

> +    _kind = kindGNUArchive;
>      SymbolTable = i;
> -    ++i;
> -  }
> -  if (i != e) {
>      StringTable = i;

Why is StringTable set here? The string table and SymbolTable aren't the same
in GNU ar.

> +    if (i != e) ++i;
> +    if ((ec = i->getName(name)))
> +      return;
> +    if (name == "//")
> +      StringTable = i;

As far as I can tell, GNU ar does not require / and // to be the first and
second members. It would be great if we could find some form of spec for this.

> +  } else if (name == "__.SYMDEF") {
> +    _kind = kindBSDArchive;
> +    SymbolTable = i;
> +    StringTable = i;

StringTable should just be set to e for BSD.

> +  } else {
> +    _kind = kindCOFFArchive;
> +
> +    if (i != e) ++i; // Nobody cares about the first member.
> +    if (i != e) {
> +      SymbolTable = i;
> +      ++i;
> +    }
> +    if (i != e) {
> +      StringTable = i;
> +    }
>    }
> -
>    ec = object_error::success;
>  }
>
> @@ -222,17 +253,37 @@
>
>  error_code Archive::Symbol::getMember(child_iterator &Result) const {
>    const char *buf = Parent->SymbolTable->getBuffer()->getBufferStart();
> -  uint32_t member_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
>    const char *offsets = buf + 4;
> -  buf += 4 + (member_count * 4); // Skip offsets.
> -  const char *indicies = buf + 4;
>
> -  uint16_t offsetindex =
> -    *(reinterpret_cast<const support::ulittle16_t*>(indicies)
> -      + SymbolIndex);
> +  uint32_t member_count = 0;
> +  uint16_t offsetindex = 0;
> +  uint32_t offset = 0;

These should follow the llvm naming convention. I realize quite a bit of the
local variables in this file don't, but added stuff should.

Also, these should be declared where they are used.

>
> -  uint32_t offset = *(reinterpret_cast<const support::ulittle32_t*>(offsets)
> +  // If the archive type is a GNU archive
> +  // the fields would be
> +  // member count
> +  // symbol offset
> +  // symbol offset
> +  // ...
> +  // symbol name\0
> +  // symbol name\0
> +  // ...

This comment doesn't really help with understanding the code.

> +  if (Parent->kind() == kindGNUArchive) {
> +    member_count = *reinterpret_cast<const support::ubig32_t*>(buf);

member_count is never used.

> +    offsetindex = SymbolIndex;

offset index is only used once.

> +    offset = *(reinterpret_cast<const support::ubig32_t*>(offsets)
> +                          + offsetindex);
> +  }
> +  else {

No newline before else.

> +    member_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
> +    buf += 4 + (member_count * 4); // Skip offsets.

Use sizeof(uint32_t) instead to make it clear what 4 means.

> +    const char *indicies = buf + 4;
> +    offsetindex =
> +      *(reinterpret_cast<const support::ulittle16_t*>(indicies)
> +        + SymbolIndex);
> +    offset = *(reinterpret_cast<const support::ulittle32_t*>(offsets)
>                        + (offsetindex - 1));

You need to verify that these pointers are within the bounds of the
symbol table.

> +  }

What about BSD?

>
>    const char *Loc = Parent->getData().begin() + offset;
>    size_t Size = sizeof(ArchiveMemberHeader) +
> @@ -253,10 +304,21 @@
>
>  Archive::symbol_iterator Archive::begin_symbols() const {
>    const char *buf = SymbolTable->getBuffer()->getBufferStart();
> -  uint32_t member_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
> -  buf += 4 + (member_count * 4); // Skip offsets.
> -  uint32_t symbol_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
> -  buf += 4 + (symbol_count * 2); // Skip indices.
> +  uint32_t member_count = 0;
> +  uint32_t symbol_count = 0;

Naming convention and declare when used.

> +  // If the archive type is a GNU archive
> +  // symbol count is equal to member count
> +  if (kind() == kindGNUArchive) {
> +    member_count = *reinterpret_cast<const support::ubig32_t*>(buf);
> +    symbol_count = *reinterpret_cast<const support::ubig32_t*>(buf);

Don't read the same thing twice. And member_count isn't used.

> +    buf += 4 + (symbol_count * 4);
> +  }
> +  else {

No newline before else.

> +    member_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
> +    buf += 4 + (member_count * 4); // Skip offsets.
> +    symbol_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
> +    buf += 4 + (symbol_count * 2); // Skip indices.
> +  }
>    uint32_t string_start_offset =
>      buf - SymbolTable->getBuffer()->getBufferStart();
>    return symbol_iterator(Symbol(this, 0, string_start_offset));
> @@ -264,9 +326,20 @@
>
>  Archive::symbol_iterator Archive::end_symbols() const {
>    const char *buf = SymbolTable->getBuffer()->getBufferStart();
> -  uint32_t member_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
> -  buf += 4 + (member_count * 4); // Skip offsets.
> -  uint32_t symbol_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
> +  uint32_t member_count = 0;
> +  uint32_t symbol_count = 0;
> +  // If the archive type is a GNU archive
> +  // symbol count is equal to member count
> +  if (kind() == kindGNUArchive) {
> +    member_count = *reinterpret_cast<const support::ubig32_t*>(buf);
> +    symbol_count = *reinterpret_cast<const support::ubig32_t*>(buf);
> +    buf += 4 + (symbol_count * 4);
> +  }
> +  else {
> +    member_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
> +    buf += 4 + (member_count * 4); // Skip offsets.
> +    symbol_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
> +  }

Same issues as above.

>    return symbol_iterator(
>      Symbol(this, symbol_count, 0));
>  }
>

You don't need to duplicate all the test infrastructure for llvm-nm. Add your
tests to test/Object.

- Michael Spencer