[llvm-commits] [lld] Support reading archive libraries

Fri Nov 2 15:22:43 PDT 2012

Hi Michael,

Thanks for the review.

> Needs to follow llvm naming standard, and Archive is redundant.
> http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly
>
> K_GNU,
> K_BSD,
> K_COFF

Will change it accordingly.
> + child_iterator begin_archiveidx() const;
> This is never defined.
Will remove this.
>> +
>>     symbol_iterator begin_symbols() const;
>>     symbol_iterator end_symbols() const;
>>
>> @@ -136,6 +148,7 @@
>>   private:
>>     child_iterator SymbolTable;
>>     child_iterator StringTable;
>> +  Kind _kind;
> Follow llvm naming conventions.
Sure.

>
>
> +  cl::opt<bool> ArchiveIndex("archive-index",
> +    cl::desc("Print the archive index"));
> My version of nm names this --print-armap.
Will change this.

+      if (ArchiveIndex) {
+        outs() << "archive index" << "\n";

> My version of nm prints this with a capital A.
>
>> +        for (object::Archive::symbol_iterator i = a->begin_symbols(), e = a->end_symbols(); i != e; ++i) {
> 80 col.

Will change to capital A, and align to 80 columns.
> + + // If the archive is a GNU archive, then the string table would 
> have + // the name and /, if the filename exists in the string table. 
> it would only + // exist in the string table if the filename is a long 
> filename
> I don't understand what this comment is trying to say. Are you refering to all
> names in a GNU archive ending with /?
Yes.

>> +    if (Parent->kind() == kindGNUArchive) {
>> +      StringRef::size_type end = StringRef(addr).find('/');
>> +      Result = StringRef(addr, end);
>> +    }
>> +    else {
> No newline before else.
Sure.
>
>> +      Result = addr;
>> +    }
>>       return object_error::success;
>>     } else if (name.startswith("#1/")) {
>>       APInt name_size;
>> @@ -187,15 +197,36 @@
>>     child_iterator i = begin_children(false);
>>     child_iterator e = end_children();
>>
>> -  if (i != e) ++i; // Nobody cares about the first member.
>> -  if (i != e) {
> There should be a comment here explaining that there are different formats and
> how we detect the difference.
Ok.
>> +  StringRef name;
>> +  if ((ec = i->getName(name)))
>> +    return;
>> +  if (name == "/") {
>> +    // If the archive format is a GNU archive
>> +    // the first member is important, it would be the symbol table
> This is wrong. The first member in COFF is also /.
The format which Archive was handling, was already handling COFF I 
believe. I have added changes mainly
for reading GNU archives.
>
>> +    _kind = kindGNUArchive;
>>       SymbolTable = i;
>> -    ++i;
>> -  }
>> -  if (i != e) {
>>       StringTable = i;
> Why is StringTable set here? The string table and SymbolTable aren't the same
> in GNU ar.
It is, there is a seperate string table, only if there are long filenames.
>
>> +    if (i != e) ++i;
>> +    if ((ec = i->getName(name)))
>> +      return;
>> +    if (name == "//")
>> +      StringTable = i;
> As far as I can tell, GNU ar does not require / and // to be the first and
> second members. It would be great if we could find some form of spec for this.
I cant find the spec for the GNU ar, but as far as I tested, they 
appeared consecutively.
>> +  } else if (name == "__.SYMDEF") {
>> +    _kind = kindBSDArchive;
>> +    SymbolTable = i;
>> +    StringTable = i;
> StringTable should just be set to e for BSD.
Ok.
>
>> +  } else {
>> +    _kind = kindCOFFArchive;
>> +
>> +    if (i != e) ++i; // Nobody cares about the first member.
>> +    if (i != e) {
>> +      SymbolTable = i;
>> +      ++i;
>> +    }
>> +    if (i != e) {
>> +      StringTable = i;
>> +    }
>>     }
>> -
>>     ec = object_error::success;
>>   }
>>
>> @@ -222,17 +253,37 @@
>>
>>   error_code Archive::Symbol::getMember(child_iterator &Result) const {
>>     const char *buf = Parent->SymbolTable->getBuffer()->getBufferStart();
>> -  uint32_t member_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
>>     const char *offsets = buf + 4;
>> -  buf += 4 + (member_count * 4); // Skip offsets.
>> -  const char *indicies = buf + 4;
>>
>> -  uint16_t offsetindex =
>> -    *(reinterpret_cast<const support::ulittle16_t*>(indicies)
>> -      + SymbolIndex);
>> +  uint32_t member_count = 0;
>> +  uint16_t offsetindex = 0;
>> +  uint32_t offset = 0;
> These should follow the llvm naming convention. I realize quite a bit of the
> local variables in this file don't, but added stuff should.
>
> Also, these should be declared where they are used.
>
>> -  uint32_t offset = *(reinterpret_cast<const support::ulittle32_t*>(offsets)
>> +  // If the archive type is a GNU archive
>> +  // the fields would be
>> +  // member count
>> +  // symbol offset
>> +  // symbol offset
>> +  // ...
>> +  // symbol name\0
>> +  // symbol name\0
>> +  // ...
> This comment doesn't really help with understanding the code.
its describing the format, maybe I will add more text to that.
>
>> +  if (Parent->kind() == kindGNUArchive) {
>> +    member_count = *reinterpret_cast<const support::ubig32_t*>(buf);
> member_count is never used.
>
>> +    offsetindex = SymbolIndex;
> offset index is only used once.

Will remove member_count and replace offset index with SymbolIndex.

>
>> +    offset = *(reinterpret_cast<const support::ubig32_t*>(offsets)
>> +                          + offsetindex);
>> +  }
>> +  else {
> No newline before else.
Ok.
>
>> +    member_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
>> +    buf += 4 + (member_count * 4); // Skip offsets.
> Use sizeof(uint32_t) instead to make it clear what 4 means.
Ok.
>> +    const char *indicies = buf + 4;
>> +    offsetindex =
>> +      *(reinterpret_cast<const support::ulittle16_t*>(indicies)
>> +        + SymbolIndex);
>> +    offset = *(reinterpret_cast<const support::ulittle32_t*>(offsets)
>>                         + (offsetindex - 1));
> You need to verify that these pointers are within the bounds of the
> symbol table.

Ok.

>
>> +  }
> What about BSD?
BSD is not complete, do you want me to go ahead and remove the part of 
the functionality that is there for BSD ?
>
>>     const char *Loc = Parent->getData().begin() + offset;
>>     size_t Size = sizeof(ArchiveMemberHeader) +
>> @@ -253,10 +304,21 @@
>>
>>   Archive::symbol_iterator Archive::begin_symbols() const {
>>     const char *buf = SymbolTable->getBuffer()->getBufferStart();
>> -  uint32_t member_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
>> -  buf += 4 + (member_count * 4); // Skip offsets.
>> -  uint32_t symbol_count = *reinterpret_cast<const support::ulittle32_t*>(buf);
>> -  buf += 4 + (symbol_count * 2); // Skip indices.
>> +  uint32_t member_count = 0;
>> +  uint32_t symbol_count = 0;
> Naming convention and declare when used.
Ok.
>
>> +  // If the archive type is a GNU archive
>> +  // symbol count is equal to member count
>> +  if (kind() == kindGNUArchive) {
>> +    member_count = *reinterpret_cast<const support::ubig32_t*>(buf);
>> +    symbol_count = *reinterpret_cast<const support::ubig32_t*>(buf);
> Don't read the same thing twice. And member_count isn't used.

Ok.
>
>> +    buf += 4 + (symbol_count * 4);
>> +  }
>> +  else {
> No newline before else.
Ok.
>
> You don't need to duplicate all the test infrastructure for llvm-nm. Add your
> tests to test/Object.

Sure, will remove the test framework for nm.

Thanks

Shankar Easwaran

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation