[llvm-commits] [lld] Support reading archive libraries

Fri Nov 2 16:24:15 PDT 2012

On Fri, Nov 2, 2012 at 3:22 PM, Shankar Easwaran
<shankare at codeaurora.org> wrote:
> Hi Michael,
>
> Thanks for the review.
>
>
>> Needs to follow llvm naming standard, and Archive is redundant.
>>
>> http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly
>>
>> K_GNU,
>> K_BSD,
>> K_COFF
>
>
> Will change it accordingly.
>
>> + child_iterator begin_archiveidx() const;
>> This is never defined.
>
> Will remove this.
>
>>> +
>>>     symbol_iterator begin_symbols() const;
>>>     symbol_iterator end_symbols() const;
>>>
>>> @@ -136,6 +148,7 @@
>>>   private:
>>>     child_iterator SymbolTable;
>>>     child_iterator StringTable;
>>> +  Kind _kind;
>>
>> Follow llvm naming conventions.
>
> Sure.
>
>
>>
>>
>> +  cl::opt<bool> ArchiveIndex("archive-index",
>> +    cl::desc("Print the archive index"));
>> My version of nm names this --print-armap.
>
> Will change this.
>
>
> +      if (ArchiveIndex) {
> +        outs() << "archive index" << "\n";
>
>> My version of nm prints this with a capital A.
>>
>>> +        for (object::Archive::symbol_iterator i = a->begin_symbols(), e
>>> = a->end_symbols(); i != e; ++i) {
>>
>> 80 col.
>
>
> Will change to capital A, and align to 80 columns.
>
>> + + // If the archive is a GNU archive, then the string table would have +
>> // the name and /, if the filename exists in the string table. it would only
>> + // exist in the string table if the filename is a long filename
>> I don't understand what this comment is trying to say. Are you refering to
>> all
>> names in a GNU archive ending with /?
>
> Yes.
>
>
>>> +    if (Parent->kind() == kindGNUArchive) {
>>> +      StringRef::size_type end = StringRef(addr).find('/');
>>> +      Result = StringRef(addr, end);
>>> +    }
>>> +    else {
>>
>> No newline before else.
>
> Sure.
>
>>
>>> +      Result = addr;
>>> +    }
>>>       return object_error::success;
>>>     } else if (name.startswith("#1/")) {
>>>       APInt name_size;
>>> @@ -187,15 +197,36 @@
>>>     child_iterator i = begin_children(false);
>>>     child_iterator e = end_children();
>>>
>>> -  if (i != e) ++i; // Nobody cares about the first member.
>>> -  if (i != e) {
>>
>> There should be a comment here explaining that there are different formats
>> and
>> how we detect the difference.
>
> Ok.
>
>>> +  StringRef name;
>>> +  if ((ec = i->getName(name)))
>>> +    return;
>>> +  if (name == "/") {
>>> +    // If the archive format is a GNU archive
>>> +    // the first member is important, it would be the symbol table
>>
>> This is wrong. The first member in COFF is also /.
>
> The format which Archive was handling, was already handling COFF I believe.
> I have added changes mainly
> for reading GNU archives.

This change will recognize COFF archives as GNU archives.

>
>>
>>> +    _kind = kindGNUArchive;
>>>       SymbolTable = i;
>>> -    ++i;
>>> -  }
>>> -  if (i != e) {
>>>       StringTable = i;
>>
>> Why is StringTable set here? The string table and SymbolTable aren't the
>> same
>> in GNU ar.
>
> It is, there is a seperate string table, only if there are long filenames.

StringTable should be set to e if one is not present.

>
>>
>>> +    if (i != e) ++i;
>>> +    if ((ec = i->getName(name)))
>>> +      return;
>>> +    if (name == "//")
>>> +      StringTable = i;
>>
>> As far as I can tell, GNU ar does not require / and // to be the first and
>> second members. It would be great if we could find some form of spec for
>> this.
>
> I cant find the spec for the GNU ar, but as far as I tested, they appeared
> consecutively.

I suppose it's fine to assume this. But we should give an error if
this is not the case.

>
>>> +  } else if (name == "__.SYMDEF") {
>>> +    _kind = kindBSDArchive;
>>> +    SymbolTable = i;
>>> +    StringTable = i;
>>
>> StringTable should just be set to e for BSD.
>
> Ok.
>
>>
>>> +  } else {
>>> +    _kind = kindCOFFArchive;
>>> +
>>> +    if (i != e) ++i; // Nobody cares about the first member.
>>> +    if (i != e) {
>>> +      SymbolTable = i;
>>> +      ++i;
>>> +    }
>>> +    if (i != e) {
>>> +      StringTable = i;
>>> +    }
>>>     }
>>> -
>>>     ec = object_error::success;
>>>   }
>>>
>>> @@ -222,17 +253,37 @@
>>>
>>>   error_code Archive::Symbol::getMember(child_iterator &Result) const {
>>>     const char *buf = Parent->SymbolTable->getBuffer()->getBufferStart();
>>> -  uint32_t member_count = *reinterpret_cast<const
>>> support::ulittle32_t*>(buf);
>>>     const char *offsets = buf + 4;
>>> -  buf += 4 + (member_count * 4); // Skip offsets.
>>> -  const char *indicies = buf + 4;
>>>
>>> -  uint16_t offsetindex =
>>> -    *(reinterpret_cast<const support::ulittle16_t*>(indicies)
>>> -      + SymbolIndex);
>>> +  uint32_t member_count = 0;
>>> +  uint16_t offsetindex = 0;
>>> +  uint32_t offset = 0;
>>
>> These should follow the llvm naming convention. I realize quite a bit of
>> the
>> local variables in this file don't, but added stuff should.
>>
>> Also, these should be declared where they are used.
>>
>>> -  uint32_t offset = *(reinterpret_cast<const
>>> support::ulittle32_t*>(offsets)
>>> +  // If the archive type is a GNU archive
>>> +  // the fields would be
>>> +  // member count
>>> +  // symbol offset
>>> +  // symbol offset
>>> +  // ...
>>> +  // symbol name\0
>>> +  // symbol name\0
>>> +  // ...
>>
>> This comment doesn't really help with understanding the code.
>
> its describing the format, maybe I will add more text to that.
>
>>
>>> +  if (Parent->kind() == kindGNUArchive) {
>>> +    member_count = *reinterpret_cast<const support::ubig32_t*>(buf);
>>
>> member_count is never used.
>>
>>> +    offsetindex = SymbolIndex;
>>
>> offset index is only used once.
>
>
> Will remove member_count and replace offset index with SymbolIndex.
>
>
>>
>>> +    offset = *(reinterpret_cast<const support::ubig32_t*>(offsets)
>>> +                          + offsetindex);
>>> +  }
>>> +  else {
>>
>> No newline before else.
>
> Ok.
>
>>
>>> +    member_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
>>> +    buf += 4 + (member_count * 4); // Skip offsets.
>>
>> Use sizeof(uint32_t) instead to make it clear what 4 means.
>
> Ok.
>
>>> +    const char *indicies = buf + 4;
>>> +    offsetindex =
>>> +      *(reinterpret_cast<const support::ulittle16_t*>(indicies)
>>> +        + SymbolIndex);
>>> +    offset = *(reinterpret_cast<const support::ulittle32_t*>(offsets)
>>>                         + (offsetindex - 1));
>>
>> You need to verify that these pointers are within the bounds of the
>> symbol table.
>
>
> Ok.
>
>>
>>> +  }
>>
>> What about BSD?
>
> BSD is not complete, do you want me to go ahead and remove the part of the
> functionality that is there for BSD ?

Leave it but add llvm_unreachable where we don't handle BSD.

>
>>
>>>     const char *Loc = Parent->getData().begin() + offset;
>>>     size_t Size = sizeof(ArchiveMemberHeader) +
>>> @@ -253,10 +304,21 @@
>>>
>>>   Archive::symbol_iterator Archive::begin_symbols() const {
>>>     const char *buf = SymbolTable->getBuffer()->getBufferStart();
>>> -  uint32_t member_count = *reinterpret_cast<const
>>> support::ulittle32_t*>(buf);
>>> -  buf += 4 + (member_count * 4); // Skip offsets.
>>> -  uint32_t symbol_count = *reinterpret_cast<const
>>> support::ulittle32_t*>(buf);
>>> -  buf += 4 + (symbol_count * 2); // Skip indices.
>>> +  uint32_t member_count = 0;
>>> +  uint32_t symbol_count = 0;
>>
>> Naming convention and declare when used.
>
> Ok.
>
>>
>>> +  // If the archive type is a GNU archive
>>> +  // symbol count is equal to member count
>>> +  if (kind() == kindGNUArchive) {
>>> +    member_count = *reinterpret_cast<const support::ubig32_t*>(buf);
>>> +    symbol_count = *reinterpret_cast<const support::ubig32_t*>(buf);
>>
>> Don't read the same thing twice. And member_count isn't used.
>
>
> Ok.
>
>>
>>> +    buf += 4 + (symbol_count * 4);
>>> +  }
>>> +  else {
>>
>> No newline before else.
>
> Ok.
>
>>
>> You don't need to duplicate all the test infrastructure for llvm-nm. Add
>> your
>> tests to test/Object.
>
>
> Sure, will remove the test framework for nm.
>
>
> Thanks
>
> Shankar Easwaran
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by
> the Linux Foundation
>

- Michael Spencer