[llvm-commits] [lld] Support reading archive libraries
Michael Spencer
bigcheesegs at gmail.com
Fri Nov 2 14:24:54 PDT 2012
On Thu, Nov 1, 2012 at 12:33 PM, Shankar Easwaran
<shankare at codeaurora.org> wrote:
> Hi,
>
> Please review the patch to add reading archive libraries in lld.
>
> added support in lld core to read archive libraries
> changed llvm lib to support reading GNU style archives
> added a new flag in llvm-nm to display archive index
> added testsuite for llvm-nm
>
> Thanks
>
> Shankar Easwaran
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
> the Linux Foundation
Here's a review of the llvm part.
> Index: include/llvm/Object/Archive.h
> ===================================================================
> --- include/llvm/Object/Archive.h (revision 166683)
> +++ include/llvm/Object/Archive.h (working copy)
> @@ -122,9 +122,21 @@
>
> Archive(MemoryBuffer *source, error_code &ec);
>
> + enum Kind {
> + kindGNUArchive,
> + kindBSDArchive,
> + kindCOFFArchive
> + };
Needs to follow llvm naming standard, and Archive is redundant.
http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly
K_GNU,
K_BSD,
K_COFF
> +
> + Kind kind() const {
> + return _kind;
> + }
> +
> child_iterator begin_children(bool skip_internal = true) const;
> child_iterator end_children() const;
>
> + child_iterator begin_archiveidx() const;
This is never defined.
> +
> symbol_iterator begin_symbols() const;
> symbol_iterator end_symbols() const;
>
> @@ -136,6 +148,7 @@
> private:
> child_iterator SymbolTable;
> child_iterator StringTable;
> + Kind _kind;
Follow llvm naming conventions.
> };
>
> }
> Index: tools/llvm-nm/llvm-nm.cpp
> ===================================================================
> --- tools/llvm-nm/llvm-nm.cpp (revision 166683)
> +++ tools/llvm-nm/llvm-nm.cpp (working copy)
> @@ -113,6 +113,10 @@
> cl::opt<bool> WithoutAliases("without-aliases", cl::Hidden,
> cl::desc("Exclude aliases from output"));
>
> + cl::opt<bool> ArchiveIndex("archive-index",
> + cl::desc("Print the archive index"));
My version of nm names this --print-armap.
> + cl::alias ArchiveIndexs("s", cl::desc("Alias for --archive-index"),
> + cl::aliasopt(ArchiveIndex));
> bool PrintAddress = true;
>
> bool MultipleFiles = false;
> @@ -346,6 +350,23 @@
> return;
>
> if (object::Archive *a = dyn_cast<object::Archive>(arch.get())) {
> + if (ArchiveIndex) {
> + outs() << "archive index" << "\n";
My version of nm prints this with a capital A.
> + for (object::Archive::symbol_iterator i = a->begin_symbols(), e = a->end_symbols(); i != e; ++i) {
80 col.
> + object::Archive::child_iterator c;
> + StringRef symname;
> + StringRef filename;
> + if (error(i->getMember(c)))
> + return;
> + if (error(i->getName(symname)))
> + return;
> + if (error(c->getName(filename)))
> + return;
> + outs() << symname << " in " << filename << "\n";
> + }
> + outs() << "\n";
> + }
> +
> for (object::Archive::child_iterator i = a->begin_children(),
> e = a->end_children(); i != e; ++i) {
> OwningPtr<Binary> child;
> Index: lib/Object/Archive.cpp
> ===================================================================
> --- lib/Object/Archive.cpp (revision 166683)
> +++ lib/Object/Archive.cpp (working copy)
> @@ -122,7 +122,17 @@
> + sizeof(ArchiveMemberHeader)
> + Parent->StringTable->getSize()))
> return object_error::parse_failed;
> - Result = addr;
> +
> + // If the archive is a GNU archive, then the string table would have
> + // the name and /, if the filename exists in the string table. it would only
> + // exist in the string table if the filename is a long filename
I don't understand what this comment is trying to say. Are you refering to all
names in a GNU archive ending with /?
> + if (Parent->kind() == kindGNUArchive) {
> + StringRef::size_type end = StringRef(addr).find('/');
> + Result = StringRef(addr, end);
> + }
> + else {
No newline before else.
> + Result = addr;
> + }
> return object_error::success;
> } else if (name.startswith("#1/")) {
> APInt name_size;
> @@ -187,15 +197,36 @@
> child_iterator i = begin_children(false);
> child_iterator e = end_children();
>
> - if (i != e) ++i; // Nobody cares about the first member.
> - if (i != e) {
There should be a comment here explaining that there are different formats and
how we detect the difference.
> + StringRef name;
> + if ((ec = i->getName(name)))
> + return;
> + if (name == "/") {
> + // If the archive format is a GNU archive
> + // the first member is important, it would be the symbol table
This is wrong. The first member in COFF is also /.
> + _kind = kindGNUArchive;
> SymbolTable = i;
> - ++i;
> - }
> - if (i != e) {
> StringTable = i;
Why is StringTable set here? The string table and SymbolTable aren't the same
in GNU ar.
> + if (i != e) ++i;
> + if ((ec = i->getName(name)))
> + return;
> + if (name == "//")
> + StringTable = i;
As far as I can tell, GNU ar does not require / and // to be the first and
second members. It would be great if we could find some form of spec for this.
> + } else if (name == "__.SYMDEF") {
> + _kind = kindBSDArchive;
> + SymbolTable = i;
> + StringTable = i;
StringTable should just be set to e for BSD.
> + } else {
> + _kind = kindCOFFArchive;
> +
> + if (i != e) ++i; // Nobody cares about the first member.
> + if (i != e) {
> + SymbolTable = i;
> + ++i;
> + }
> + if (i != e) {
> + StringTable = i;
> + }
> }
> -
> ec = object_error::success;
> }
>
> @@ -222,17 +253,37 @@
>
> error_code Archive::Symbol::getMember(child_iterator &Result) const {
> const char *buf = Parent->SymbolTable->getBuffer()->getBufferStart();
> - uint32_t member_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
> const char *offsets = buf + 4;
> - buf += 4 + (member_count * 4); // Skip offsets.
> - const char *indicies = buf + 4;
>
> - uint16_t offsetindex =
> - *(reinterpret_cast<const support::ulittle16_t*>(indicies)
> - + SymbolIndex);
> + uint32_t member_count = 0;
> + uint16_t offsetindex = 0;
> + uint32_t offset = 0;
These should follow the llvm naming convention. I realize quite a bit of the
local variables in this file don't, but added stuff should.
Also, these should be declared where they are used.
>
> - uint32_t offset = *(reinterpret_cast<const support::ulittle32_t*>(offsets)
> + // If the archive type is a GNU archive
> + // the fields would be
> + // member count
> + // symbol offset
> + // symbol offset
> + // ...
> + // symbol name\0
> + // symbol name\0
> + // ...
This comment doesn't really help with understanding the code.
> + if (Parent->kind() == kindGNUArchive) {
> + member_count = *reinterpret_cast<const support::ubig32_t*>(buf);
member_count is never used.
> + offsetindex = SymbolIndex;
offset index is only used once.
> + offset = *(reinterpret_cast<const support::ubig32_t*>(offsets)
> + + offsetindex);
> + }
> + else {
No newline before else.
> + member_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
> + buf += 4 + (member_count * 4); // Skip offsets.
Use sizeof(uint32_t) instead to make it clear what 4 means.
> + const char *indicies = buf + 4;
> + offsetindex =
> + *(reinterpret_cast<const support::ulittle16_t*>(indicies)
> + + SymbolIndex);
> + offset = *(reinterpret_cast<const support::ulittle32_t*>(offsets)
> + (offsetindex - 1));
You need to verify that these pointers are within the bounds of the
symbol table.
> + }
What about BSD?
>
> const char *Loc = Parent->getData().begin() + offset;
> size_t Size = sizeof(ArchiveMemberHeader) +
> @@ -253,10 +304,21 @@
>
> Archive::symbol_iterator Archive::begin_symbols() const {
> const char *buf = SymbolTable->getBuffer()->getBufferStart();
> - uint32_t member_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
> - buf += 4 + (member_count * 4); // Skip offsets.
> - uint32_t symbol_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
> - buf += 4 + (symbol_count * 2); // Skip indices.
> + uint32_t member_count = 0;
> + uint32_t symbol_count = 0;
Naming convention and declare when used.
> + // If the archive type is a GNU archive
> + // symbol count is equal to member count
> + if (kind() == kindGNUArchive) {
> + member_count = *reinterpret_cast<const support::ubig32_t*>(buf);
> + symbol_count = *reinterpret_cast<const support::ubig32_t*>(buf);
Don't read the same thing twice. And member_count isn't used.
> + buf += 4 + (symbol_count * 4);
> + }
> + else {
No newline before else.
> + member_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
> + buf += 4 + (member_count * 4); // Skip offsets.
> + symbol_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
> + buf += 4 + (symbol_count * 2); // Skip indices.
> + }
> uint32_t string_start_offset =
> buf - SymbolTable->getBuffer()->getBufferStart();
> return symbol_iterator(Symbol(this, 0, string_start_offset));
> @@ -264,9 +326,20 @@
>
> Archive::symbol_iterator Archive::end_symbols() const {
> const char *buf = SymbolTable->getBuffer()->getBufferStart();
> - uint32_t member_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
> - buf += 4 + (member_count * 4); // Skip offsets.
> - uint32_t symbol_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
> + uint32_t member_count = 0;
> + uint32_t symbol_count = 0;
> + // If the archive type is a GNU archive
> + // symbol count is equal to member count
> + if (kind() == kindGNUArchive) {
> + member_count = *reinterpret_cast<const support::ubig32_t*>(buf);
> + symbol_count = *reinterpret_cast<const support::ubig32_t*>(buf);
> + buf += 4 + (symbol_count * 4);
> + }
> + else {
> + member_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
> + buf += 4 + (member_count * 4); // Skip offsets.
> + symbol_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
> + }
Same issues as above.
> return symbol_iterator(
> Symbol(this, symbol_count, 0));
> }
>
You don't need to duplicate all the test infrastructure for llvm-nm. Add your
tests to test/Object.
- Michael Spencer
More information about the llvm-commits
mailing list