[llvm-commits] [lld] Support reading archive libraries
Michael Spencer
bigcheesegs at gmail.com
Fri Nov 2 16:24:15 PDT 2012
On Fri, Nov 2, 2012 at 3:22 PM, Shankar Easwaran
<shankare at codeaurora.org> wrote:
> Hi Michael,
>
> Thanks for the review.
>
>
>> Needs to follow llvm naming standard, and Archive is redundant.
>>
>> http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly
>>
>> K_GNU,
>> K_BSD,
>> K_COFF
>
>
> Will change it accordingly.
>
>> + child_iterator begin_archiveidx() const;
>> This is never defined.
>
> Will remove this.
>
>>> +
>>> symbol_iterator begin_symbols() const;
>>> symbol_iterator end_symbols() const;
>>>
>>> @@ -136,6 +148,7 @@
>>> private:
>>> child_iterator SymbolTable;
>>> child_iterator StringTable;
>>> + Kind _kind;
>>
>> Follow llvm naming conventions.
>
> Sure.
>
>
>>
>>
>> + cl::opt<bool> ArchiveIndex("archive-index",
>> + cl::desc("Print the archive index"));
>> My version of nm names this --print-armap.
>
> Will change this.
>
>
> + if (ArchiveIndex) {
> + outs() << "archive index" << "\n";
>
>> My version of nm prints this with a capital A.
>>
>>> + for (object::Archive::symbol_iterator i = a->begin_symbols(), e
>>> = a->end_symbols(); i != e; ++i) {
>>
>> 80 col.
>
>
> Will change to capital A, and align to 80 columns.
>
>> + + // If the archive is a GNU archive, then the string table would have +
>> // the name and /, if the filename exists in the string table. it would only
>> + // exist in the string table if the filename is a long filename
>> I don't understand what this comment is trying to say. Are you refering to
>> all
>> names in a GNU archive ending with /?
>
> Yes.
>
>
>>> + if (Parent->kind() == kindGNUArchive) {
>>> + StringRef::size_type end = StringRef(addr).find('/');
>>> + Result = StringRef(addr, end);
>>> + }
>>> + else {
>>
>> No newline before else.
>
> Sure.
>
>>
>>> + Result = addr;
>>> + }
>>> return object_error::success;
>>> } else if (name.startswith("#1/")) {
>>> APInt name_size;
>>> @@ -187,15 +197,36 @@
>>> child_iterator i = begin_children(false);
>>> child_iterator e = end_children();
>>>
>>> - if (i != e) ++i; // Nobody cares about the first member.
>>> - if (i != e) {
>>
>> There should be a comment here explaining that there are different formats
>> and
>> how we detect the difference.
>
> Ok.
>
>>> + StringRef name;
>>> + if ((ec = i->getName(name)))
>>> + return;
>>> + if (name == "/") {
>>> + // If the archive format is a GNU archive
>>> + // the first member is important, it would be the symbol table
>>
>> This is wrong. The first member in COFF is also /.
>
> The format which Archive was handling, was already handling COFF I believe.
> I have added changes mainly
> for reading GNU archives.
This change will recognize COFF archives as GNU archives.
>
>>
>>> + _kind = kindGNUArchive;
>>> SymbolTable = i;
>>> - ++i;
>>> - }
>>> - if (i != e) {
>>> StringTable = i;
>>
>> Why is StringTable set here? The string table and SymbolTable aren't the
>> same
>> in GNU ar.
>
> It is, there is a seperate string table, only if there are long filenames.
StringTable should be set to e if one is not present.
>
>>
>>> + if (i != e) ++i;
>>> + if ((ec = i->getName(name)))
>>> + return;
>>> + if (name == "//")
>>> + StringTable = i;
>>
>> As far as I can tell, GNU ar does not require / and // to be the first and
>> second members. It would be great if we could find some form of spec for
>> this.
>
> I cant find the spec for the GNU ar, but as far as I tested, they appeared
> consecutively.
I suppose it's fine to assume this. But we should give an error if
this is not the case.
>
>>> + } else if (name == "__.SYMDEF") {
>>> + _kind = kindBSDArchive;
>>> + SymbolTable = i;
>>> + StringTable = i;
>>
>> StringTable should just be set to e for BSD.
>
> Ok.
>
>>
>>> + } else {
>>> + _kind = kindCOFFArchive;
>>> +
>>> + if (i != e) ++i; // Nobody cares about the first member.
>>> + if (i != e) {
>>> + SymbolTable = i;
>>> + ++i;
>>> + }
>>> + if (i != e) {
>>> + StringTable = i;
>>> + }
>>> }
>>> -
>>> ec = object_error::success;
>>> }
>>>
>>> @@ -222,17 +253,37 @@
>>>
>>> error_code Archive::Symbol::getMember(child_iterator &Result) const {
>>> const char *buf = Parent->SymbolTable->getBuffer()->getBufferStart();
>>> - uint32_t member_count = *reinterpret_cast<const
>>> support::ulittle32_t*>(buf);
>>> const char *offsets = buf + 4;
>>> - buf += 4 + (member_count * 4); // Skip offsets.
>>> - const char *indicies = buf + 4;
>>>
>>> - uint16_t offsetindex =
>>> - *(reinterpret_cast<const support::ulittle16_t*>(indicies)
>>> - + SymbolIndex);
>>> + uint32_t member_count = 0;
>>> + uint16_t offsetindex = 0;
>>> + uint32_t offset = 0;
>>
>> These should follow the llvm naming convention. I realize quite a bit of
>> the
>> local variables in this file don't, but added stuff should.
>>
>> Also, these should be declared where they are used.
>>
>>> - uint32_t offset = *(reinterpret_cast<const
>>> support::ulittle32_t*>(offsets)
>>> + // If the archive type is a GNU archive
>>> + // the fields would be
>>> + // member count
>>> + // symbol offset
>>> + // symbol offset
>>> + // ...
>>> + // symbol name\0
>>> + // symbol name\0
>>> + // ...
>>
>> This comment doesn't really help with understanding the code.
>
> its describing the format, maybe I will add more text to that.
>
>>
>>> + if (Parent->kind() == kindGNUArchive) {
>>> + member_count = *reinterpret_cast<const support::ubig32_t*>(buf);
>>
>> member_count is never used.
>>
>>> + offsetindex = SymbolIndex;
>>
>> offset index is only used once.
>
>
> Will remove member_count and replace offset index with SymbolIndex.
>
>
>>
>>> + offset = *(reinterpret_cast<const support::ubig32_t*>(offsets)
>>> + + offsetindex);
>>> + }
>>> + else {
>>
>> No newline before else.
>
> Ok.
>
>>
>>> + member_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
>>> + buf += 4 + (member_count * 4); // Skip offsets.
>>
>> Use sizeof(uint32_t) instead to make it clear what 4 means.
>
> Ok.
>
>>> + const char *indicies = buf + 4;
>>> + offsetindex =
>>> + *(reinterpret_cast<const support::ulittle16_t*>(indicies)
>>> + + SymbolIndex);
>>> + offset = *(reinterpret_cast<const support::ulittle32_t*>(offsets)
>>> + (offsetindex - 1));
>>
>> You need to verify that these pointers are within the bounds of the
>> symbol table.
>
>
> Ok.
>
>>
>>> + }
>>
>> What about BSD?
>
> BSD is not complete, do you want me to go ahead and remove the part of the
> functionality that is there for BSD ?
Leave it but add llvm_unreachable where we don't handle BSD.
>
>>
>>> const char *Loc = Parent->getData().begin() + offset;
>>> size_t Size = sizeof(ArchiveMemberHeader) +
>>> @@ -253,10 +304,21 @@
>>>
>>> Archive::symbol_iterator Archive::begin_symbols() const {
>>> const char *buf = SymbolTable->getBuffer()->getBufferStart();
>>> - uint32_t member_count = *reinterpret_cast<const
>>> support::ulittle32_t*>(buf);
>>> - buf += 4 + (member_count * 4); // Skip offsets.
>>> - uint32_t symbol_count = *reinterpret_cast<const
>>> support::ulittle32_t*>(buf);
>>> - buf += 4 + (symbol_count * 2); // Skip indices.
>>> + uint32_t member_count = 0;
>>> + uint32_t symbol_count = 0;
>>
>> Naming convention and declare when used.
>
> Ok.
>
>>
>>> + // If the archive type is a GNU archive
>>> + // symbol count is equal to member count
>>> + if (kind() == kindGNUArchive) {
>>> + member_count = *reinterpret_cast<const support::ubig32_t*>(buf);
>>> + symbol_count = *reinterpret_cast<const support::ubig32_t*>(buf);
>>
>> Don't read the same thing twice. And member_count isn't used.
>
>
> Ok.
>
>>
>>> + buf += 4 + (symbol_count * 4);
>>> + }
>>> + else {
>>
>> No newline before else.
>
> Ok.
>
>>
>> You don't need to duplicate all the test infrastructure for llvm-nm. Add
>> your
>> tests to test/Object.
>
>
> Sure, will remove the test framework for nm.
>
>
> Thanks
>
> Shankar Easwaran
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
> the Linux Foundation
>
- Michael Spencer
More information about the llvm-commits
mailing list