[PATCH] D111889: [AIX] Support of Big archive (read)

Fri Nov 12 01:31:22 PST 2021

jhenderson added inline comments.

================
Comment at: llvm/lib/Object/Archive.cpp:1195
+  if (RawStringRef.rtrim(' ').getAsInteger(10, FirstArOffset)) {
+    report_fatal_error("Malformed AIX Big Archive: First Archive Offset \"" +
+                       RawStringRef + "\" is not a number.");
----------------
DiggerLin wrote:
> jhenderson wrote:
> > 1) Ifs that contain only a single statement shouldn't have braces.
> > 2) Why is this a report_fatal_error call rather than setting the input `Err` variable?
> > 3) Please follow the LLVM standards for error messages (https://llvm.org/docs/CodingStandards.html#error-and-warning-messages).
> > 4) Don't capitalize "First Archive Offset". Use either the actual field name as per the spec, or just make it all lower-case.
> > 5) Don't use "Archive" here in the "First Archive Offset". Use "member" or "child" instead. The archive doesn't consist of other archives (normally!). I think the spec has used a misleading name, so let's not follow it unless we're using the term as literally stated in the spec (`fl_fstmoff`).
> thanks
There are still unnecessary braces in this `if`.

================
Comment at: llvm/lib/Object/Archive.cpp:294
+  uint64_t Size;
+  uint64_t NameLen;
+  StringRef RawSize =
----------------
Put variable declarations close to their first usage.

I'm a little confused what the name length has to do wtih the size? Add a comment to explain this near the end of this function, I suggest.

================
Comment at: llvm/lib/Object/Archive.cpp:295
+  uint64_t NameLen;
+  StringRef RawSize =
+      StringRef(ArMemHdr->Size, sizeof(ArMemHdr->Size)).rtrim(" ");
----------------
The logic for calculating and checking `RawSize` is identical to the regular archive logic. Pull it into a common function.

================
Comment at: llvm/lib/Object/Archive.cpp:308-320
+  StringRef RawNameLen =
+      StringRef(ArMemHdr->NameLen, sizeof(ArMemHdr->NameLen)).rtrim(" ");
+  if (RawNameLen.getAsInteger(10, NameLen)) {
+    uint64_t Offset =
+        reinterpret_cast<const char *>(ArMemHdr) - Parent->getData().data();
+    return malformedError(
+        "characters in name length field in archive header are not "
----------------
Seems like this should be a `getNameLen` method?

================
Comment at: llvm/lib/Object/Archive.cpp:319-322
-    std::string Buf;
-    raw_string_ostream OS(Buf);
-    OS.write_escaped(Group);
-    OS.flush();
----------------
I'm confused why you've bothered to change the logic here, but didn't with getUID. Are you sure you want to change it here? If so, why not in getUID too?

Also, have you actually changed the behaviour with this? It looks suspicious to me.

================
Comment at: llvm/lib/Object/Archive.cpp:324
+
+Expected<uint64_t> BigArchiveMemberHeader::getNextOffset() const {
+  uint64_t NextOffset;
----------------
The logic in this and related functions is very similar. I wonder if you could pull it into a function or callable class, parameterised on things like the field to check and the size and name of the field?

================
Comment at: llvm/lib/Object/Archive.cpp:326
+  uint64_t NextOffset;
+  StringRef RawStringRef =
+      StringRef(ArMemHdr->NextOffset, sizeof(ArMemHdr->NextOffset)).rtrim(" ");
----------------
Why `RawStringRef`? That doesn't sound like it's got anything to do with an offset...

================
Comment at: llvm/lib/Object/Archive.cpp:332
+                      Parent->getData().data();
+    return malformedError("characters in size field in archive header are not "
+                          "all decimal numbers: '" +
----------------
Should this be "offset field"?

================
Comment at: llvm/lib/Object/Archive.cpp:342-352
+uint64_t ArchiveMemberHeader::getOffset() const {
+  uint64_t Offset =
+      reinterpret_cast<const char *>(ArMemHdr) - Parent->getData().data();
+  return Offset;
+}
+
+uint64_t BigArchiveMemberHeader::getOffset() const {
----------------
These two functions are identical. Could they be pushed into the base class? You could pass in the pointer to get the offset from.

================
Comment at: llvm/lib/Object/Archive.cpp:354-363
+// This gets the raw access mode from the ArMemHdr->AccessMode field.
+StringRef ArchiveMemberHeader::getRawAccessMode() const {
+  return StringRef(ArMemHdr->AccessMode, sizeof(ArMemHdr->AccessMode))
+      .rtrim(' ');
+}
+
+StringRef BigArchiveMemberHeader::getRawAccessMode() const {
----------------
Same as above. Don't duplicate logic - use functions! In this case, a function that takes an AccessMode field (whatever the type of that field is) should be plenty sufficient.

================
Comment at: llvm/lib/Object/Archive.cpp:371
                           "are not all decimal numbers: '" +
-                          Buf + "' for the archive member header at offset " +
+                          getRawAccessMode() +
+                          "' for the archive member header at offset " +
----------------
Here and in similar functions, don't repeat this call: do it once and store in a variable (like you already do for the `RawSize` in `getSize`

================
Comment at: llvm/lib/Object/Archive.cpp:378

+// This gets ArMemHdr->LastModified field.
+StringRef ArchiveMemberHeader::getRawLastModified() const {
----------------
Yes, I can see that this does that, since it's in the function name. Don't bother with comments that essentially describe the same thing as the function name.

================
Comment at: llvm/lib/Object/Archive.cpp:379-387
+StringRef ArchiveMemberHeader::getRawLastModified() const {
+  return StringRef(ArMemHdr->LastModified, sizeof(ArMemHdr->LastModified))
+      .rtrim(' ');
+}
+
+StringRef BigArchiveMemberHeader::getRawLastModified() const {
+  return StringRef(ArMemHdr->LastModified, sizeof(ArMemHdr->LastModified))
----------------
As above - pull into a common function. This looks basically identical to the getRawAccessMode function, so you should be able to do something to avoid duplicating the logic (maybe use a template for the LastModified's/AccessMode's type if needed.

================
Comment at: llvm/lib/Object/Archive.cpp:390
 Expected<sys::TimePoint<std::chrono::seconds>>
-ArchiveMemberHeader::getLastModified() const {
+AbstractArchiveMemberHeader::getLastModified() const {
   unsigned Seconds;
----------------
Another very similar function to earlier functions. Consider sharing the logic.

================
Comment at: llvm/lib/Object/Archive.cpp:404-411
+// This gets the raw UID from the ArMemHdr->UID field.
+StringRef ArchiveMemberHeader::getRawUID() const {
+  return StringRef(ArMemHdr->UID, sizeof(ArMemHdr->UID)).rtrim(' ');
+}
+
+StringRef BigArchiveMemberHeader::getRawUID() const {
+  return StringRef(ArMemHdr->UID, sizeof(ArMemHdr->UID)).rtrim(' ');
----------------
Same comments as above.

================
Comment at: llvm/lib/Object/Archive.cpp:432-439
+// This gets the raw GID from the ArMemHdr->GID field.
+StringRef ArchiveMemberHeader::getRawGID() const {
+  return StringRef(ArMemHdr->GID, sizeof(ArMemHdr->GID)).rtrim(' ');
+}
+
+StringRef BigArchiveMemberHeader::getRawGID() const {
+  return StringRef(ArMemHdr->GID, sizeof(ArMemHdr->GID)).rtrim(' ');
----------------
As above.

================
Comment at: llvm/lib/Object/Archive.cpp:459
+    : Parent(Parent), Data(Data), StartOfFile(StartOfFile) {
+  // Create the right concrete archive member as a function of Kind.
+  if (Parent->kind() != K_AIXBIG) {
----------------
This block is actually another good reason why you should change the parent class to be two separate concrete classes - each class could have a function which returns the appropriate member header type using the Factory pattern.

================
Comment at: llvm/lib/Object/Archive.cpp:476-489
+  // Create the right concrete archive member as a function of Kind.
+  if (Parent->kind() != K_AIXBIG) {
+    Header = std::make_unique<ArchiveMemberHeader>(ArchiveMemberHeader(
+        Parent, Start,
+        Parent ? Parent->getData().size() - (Start - Parent->getData().data())
+               : 0,
+        Err));
----------------
See above re. moving this logic into the parent class, but also I wonder if you could share this logic with the other constructor somehow (it may not be that simple).

================
Comment at: llvm/lib/Object/Archive.cpp:533-534
+  if (Parent->kind() == Archive::K_AIXBIG) {
+    // Add name to found the real start of child object, the real start is even
+    // alignment.
+    StartOfFile += ((Name.size() + 1) >> 1) << 1;
----------------
I'm struggling a little to understand the grammar with this sentence. I //think// you're trying to say something like "The actual start of the file is after the name and any necessary even-alignment padding." or something to that effect.

================
Comment at: llvm/lib/Object/Archive.cpp:611
 Expected<Archive::Child> Archive::Child::getNext() const {
-  size_t SpaceToSkip = Data.size();
-  // If it's odd, add 1 to make it even.
-  if (SpaceToSkip & 1)
-    ++SpaceToSkip;

+  const char *NextLoc = nullptr;
----------------
Don't have blank lines at start of functions.

================
Comment at: llvm/lib/Object/Archive.cpp:614
+
+  if (Parent->kind() != K_AIXBIG) {
+    size_t SpaceToSkip = Data.size();
----------------
I'd flip this logic: put the Big Archive special case first, and then the common case, so that the conditional doesn't need to be a negative.

That being said, this may be pointing to the fact that Child needs to be a class hierarchy too, similar to my other points.

================
Comment at: llvm/lib/Object/Archive.cpp:626-627
+  } else {
+    // If current child offset is equal to the last ar offset, current child is
+    // the last one.
+    if (Header->getOffset() ==
----------------
I'm not sure you need this comment. I think the code is pretty self explanatory.

================
Comment at: llvm/lib/Object/Archive.cpp:632
+    else {
+      Expected<uint64_t> NextOffSetOrErr =
+          dyn_cast<BigArchiveMemberHeader>(Header.get())->getNextOffset();
----------------

================
Comment at: llvm/lib/Object/Archive.cpp:633
+      Expected<uint64_t> NextOffSetOrErr =
+          dyn_cast<BigArchiveMemberHeader>(Header.get())->getNextOffset();
+      if (NextOffSetOrErr)
----------------
You can use `cast` when you know the types will match. This will result in an assertion if they don't (which is fine), rather than a null pointer.

================
Comment at: llvm/lib/Object/Archive.cpp:634-637
+      if (NextOffSetOrErr)
+        NextLoc = Parent->getData().data() + NextOffSetOrErr.get();
+      else
+        return NextOffSetOrErr.takeError();
----------------
Flip this so you can avoid the `else`.

================
Comment at: llvm/lib/Object/Archive.cpp:705-710
   Error Err = Error::success();
-  std::unique_ptr<Archive> Ret(new Archive(Source, Err));
+
+  Archive *Ret;
+
+  StringRef Buffer = Source.getBuffer();
+
----------------
Get rid of the blank lines between these local variable declarations.

================
Comment at: llvm/lib/Object/Archive.cpp:712
+  if (Buffer.startswith(BigMagic))
+    Ret = new BigArchive(Source, Err);
+  else
----------------
Make `Ret` a `unique_ptr<Archive>` and use `std::make_unique` here and below. Otherwise, you have a memory leak if `Err` is reported.

================
Comment at: llvm/lib/Object/Archive.cpp:745-750
   // Make sure Format is initialized before any call to
-  // ArchiveMemberHeader::getName() is made.  This could be a valid empty
-  // archive which is the same in all formats.  So claiming it to be gnu to is
-  // fine if not totally correct before we look for a string table or table of
-  // contents.
+  // ArchiveMemberHeader::getName() is made. Read Magic to deal with AIX Big
+  // Archive. Else, this could be a valid empty archive which is the same in all
+  // formats. So claiming it to be gnu to is fine if not totally correct before
+  // we look for a string table or table of contents.
   Format = K_GNU;
----------------
Stare at this code a minute and see if you can spot the bug... (hint, what is the value of `Format` before and after if this is a BigArchive?)

================
Comment at: llvm/lib/Object/Archive.cpp:1191
+
+  StringRef RawStringRef = StringRef(ArFixLenHdr->FirstMemberOffset,
+                                     sizeof(ArFixLenHdr->FirstMemberOffset))
----------------
Change this name. Do you understand what the variable is supposed to represent, because this name makes me think you don't...?

================
Comment at: llvm/test/Object/archive-big-extract.test:3
+# RUN: rm -rf %t && mkdir -p %t/extracted/
+# RUN:  cd %t/extracted/  &&  llvm-ar x %p/Inputs/aix-big-archive.a
+# RUN: diff %t/extracted/empty.o %p/Inputs/aix-big-archive-member.o
----------------
Get rid of the extra spaces, although actually I think you should do the cd on the same line as the directory creation.

================
Comment at: llvm/test/Object/archive-big-print.test:2-3
+## Test printing an archive created by AIX ar (Big Archive).
+RUN: llvm-ar p %p/Inputs/aix-big-archive.a evenlen | FileCheck %s --check-prefix=AIXBIG --implicit-check-not={{.}}
+AIXBIG: content_of_evenlen
----------------
The comment markers aren't necessary yet, but in a future version, where you create the inputs on the fly via yaml2obj etc, you will need them.

No need for --check-prefix when there's only one FileCheck run in a file: use just the default instead.

================
Comment at: llvm/test/Object/archive-big-read.test:1
+## Test reading an archive created by AIX ar (Big Archive).
+RUN: env TZ=GMT llvm-ar tv %p/Inputs/aix-big-archive.a | FileCheck %s --strict-whitespace --check-prefix=AIXBIG --implicit-check-not={{.}}
----------------
In the future, the input will hopefully be generated on the fly rather than using a canned binary.

Also, add comment markers and don't use check-prefix, as above.

================
Comment at: llvm/test/tools/llvm-objdump/XCOFF/archive-headers.test:1
+;; Test llvm-objdump supporting the decode the archive header for xcoff.
+; RUN: llvm-objdump %p/Inputs/big-archive-libtest.a --archive-headers  | FileCheck %s
----------------
See my earlier comment re. let's not add this test just yet (we don't want to add binaries to the git repo if we can avoid it, and I think testing it should be in all tools that can take a Big Archive, once we have write support).

================
Comment at: llvm/tools/llvm-ar/llvm-ar.cpp:1013
+    std::unique_ptr<object::Archive> Archive = std::move(ArchiveOrError.get());
+
+    if (Archive->isThin())
----------------
I think you can get rid of this blank line.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111889/new/

https://reviews.llvm.org/D111889