[PATCH] D123949: [AIX] support write operation of big archive.

Fri Apr 22 04:18:02 PDT 2022

jhenderson added a comment.

Could you link to the spec again for Big Archive, so I can try to check the code against it, please?

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:42
 using namespace llvm;
+using namespace llvm::object;

----------------
Do you need this? The rest of the code in this file managed fine without it.

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:204-207
+  // Big Archive has a different size for element of Member Header
+  // We cannot use printRestOfMemberHeader.
+  // Name is written at the end of the archive,
+  // followed by the cosmetic terminator "`\n".
----------------
I'm not sure this comment adds much. Could you explain why you feel the need for it?

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:211-212
+  printWithSpacePadding(Out, Size, 20);           // File member size
+  printWithSpacePadding(Out, NextOffset, 20);     // Next member offset
+  printWithSpacePadding(Out, PrevOffset, 20);     // Previous member header
+  printWithSpacePadding(Out, sys::toTimeT(ModTime), 12); // File member date
----------------

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:221
+    if (NameLen % 2)
+      Out << (char)0; // Padding with null caracter
+  }
----------------
`printBSDMemberHeader` uses `Out.write(uint8_t(0));` to write the padding. I'd probably do the same here for consistency.

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:223
+  }
+  printWithSpacePadding(Out, "\x60\x0a", 2); // Terminator
+}
----------------
`printRestOfMemberHeader` just writes the terminator directly to the stream. Since no padding is needed for it, I think that's what you should do here too.

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:341
   // We do this for all bsd formats because it simplifies aligning members.
-  uint32_t Pad = offsetToAlignment(Size, Align(isBSDLike(Kind) ? 8 : 2));
-  Size += Pad;
-  if (Padding)
-    *Padding = Pad;
+  // For Big archive , the symbol table is the last table, do not to align.
+  if (!isAIXBigArchive(Kind)) {
----------------
And reflow the comment to 80 characters.

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:353
+                                   bool Deterministic, uint64_t Size,
+                                   uint64_t PreOffset = 0) {
   if (isBSDLike(Kind)) {
----------------
And similar for the same variable elsewhere. I think this more clearly indicates what the variable is for.

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:359
+  } else if (isAIXBigArchive(Kind)) {
+    const char *Name = "";
+    printBigArchiveMemberHeader(Out, Out.tell(), Name, now(Deterministic), 0, 0,
----------------
I don't think you need to declare the name as a separate variable. Just use the string literal directly at the call site.

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:370
                              bool Deterministic, ArrayRef<MemberData> Members,
-                             StringRef StringTable) {
+                             StringRef StringTable, uint64_t PreOffset = 0) {
   // We don't write a symbol table on an archive with no members -- except on
----------------
Do you actually need `PreOffset` to be passed in here? This function calculates the `Pos` variable below already, and the `PreOffset` value could be calculated as part of this.

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:647-649
+  uint64_t MaxOffset =
+      isAIXBigArchive(Kind) ? sizeof(BigArchive::FixLenHdr) : 8;
+  uint64_t LastOffset = MaxOffset;
----------------
Previously, the scope of these two variables was quite limited. Now that it is used in a wider range of code, I'd recommend renaming them to clarify which offsets they're referring to. I think it should be `LastMemberHeaderOffset` and `LastMemberEndOffset` or something to that effect. Is that right?

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:659

+  if (WriteSymtab && !(isAIXBigArchive(Kind))) {
     // We assume 32-bit offsets to see if 32-bit symbols are possible or not.
----------------

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:662
+    uint64_t SymtabSize = computeSymbolTableSize(
+        Kind, NumSyms, isAIXBigArchive(Kind) ? 8 : 4, SymNamesBuf);
     auto computeSymbolTableHeaderSize =
----------------
I think think you need to modify this call? This code is already guarded by `!isAIXBigArchive(Kind)`.

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:671
         };
+
     LastOffset += computeSymbolTableHeaderSize() + SymtabSize;
----------------
Any particular reason you've added this blank line?

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:689
     // member starts before the 4GB offset.
+    // Big Archive not supported yet.
     if (LastOffset >= Sym64Threshold) {
----------------
Given that this piece of code isn't hit by Big Archive anyway, this comment probably doesn't belong here?

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:711
+  } else {
+    // For BigArchive (AIX), compute a table of member names and offset,
+    // used in Member Table.
----------------
You've used "Big Archive" here but "Big archive" elsewhere. Which is the correct format? Is the name of the format "Big" or "Big Archive"? If the former, then you should use "Big archive" everywhere, including here. Otherwise, you should use "Big Archive" everywhere.

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:712
+    // For BigArchive (AIX), compute a table of member names and offset,
+    // used in Member Table.
+    uint64_t MembeTableNameStrTblSize = 0;
----------------
Member Table shouldn't be capitalized as it's not a proper noun.

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:713
+    // used in Member Table.
+    uint64_t MembeTableNameStrTblSize = 0;
+    std::vector<int> OffsetArray;
----------------

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:714-715
+    uint64_t MembeTableNameStrTblSize = 0;
+    std::vector<int> OffsetArray;
+    std::vector<StringRef> NameArray;
+    // Loop across object to find offset and names.
----------------
`int` is a signed type, which doesn't make sense for offsets.
Don't call things arrays when they aren't.

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:716
+    std::vector<StringRef> NameArray;
+    // Loop across object to find offset and names.
+    for (size_t it = 0, Size = NewMembers.size(); it != Size; ++it) {
----------------
Could you avoid the need to loop through all members twice (i.e. above where `MaxOffset` and `LastOffset` are calculated, and here), by moving the first loop inside the first half of the `if` above (like it was before), and then calculating equivalent values as you go below?

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:717
+    // Loop across object to find offset and names.
+    for (size_t it = 0, Size = NewMembers.size(); it != Size; ++it) {
+      MembeTableNameStrTblSize += NewMembers[it].MemberName.size() + 1;
----------------
`it` -> `Index` or `I`. `it` is not correct capitalization and could be confused with an iterator, which this isn't.

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:717-729
+    for (size_t it = 0, Size = NewMembers.size(); it != Size; ++it) {
+      MembeTableNameStrTblSize += NewMembers[it].MemberName.size() + 1;
+      if (it == 0)
+        OffsetArray.push_back(sizeof(BigArchive::FixLenHdr));
+      else
+        // If current member is not last, fulfill offset of next member
+        // Curent offset + Header size  + Name length +
----------------
jhenderson wrote:
> `it` -> `Index` or `I`. `it` is not correct capitalization and could be confused with an iterator, which this isn't.
Let's reorganise this loop a little, as I find it a little hard to follow.

Is there an off-by-one error here? The `MembeTableNameStrTblSize` increases by the name length plus a null terminator, but the offset only increases by the name length (possibly with a padding byte, but not always). If that's the case, I'd suggest adding the following line:
```
size_t NameSize = Member.MemberName.size() + 1;
```
and then using it in place of the name length lookups.

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:736
+
+    // Fix Sized Header
+    unsigned GlobalSymbolOffset =
----------------
This comment looks to be in the wrong place? I think it needs to be at the start of the following `printWithSpacePadding` block that actually writes it.

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:752
+    printWithSpacePadding(Out, NewMembers.size() ? LastOffset : 0,
+                          20); // Offset to last Archive member
+    printWithSpacePadding(
----------------

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:760
+      if (M.Data.size() % 2)
+        Out << (char)0;
+    }
----------------
Same below.

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:764
+    if (NewMembers.size()) {
+      // Membet table.
+      printBigArchiveMemberHeader(Out, MaxOffset, "", sys::toTimePoint(0), 0, 0,
----------------

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:769
+      printWithSpacePadding(Out, OffsetArray.size(), 20); // Number of member
+      for (const int MemberOffset : OffsetArray)
+        printWithSpacePadding(Out, MemberOffset, 20); // All offset
----------------

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:771
+        printWithSpacePadding(Out, MemberOffset, 20); // All offset
+      for (const auto MemberName : NameArray) {
+        Out << MemberName; // All names
----------------

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:772
+      for (const auto MemberName : NameArray) {
+        Out << MemberName; // All names
+        Out << (char)0;    // And a null charcter
----------------
This isn't "All names" though. This is just the current name.

Also, why not do:

```
Out << MemberName << '\0';
```

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:776-778
+        Out << (char)0; // Padding in names;
+                        // final name has a null character included, so it ends
+                        // by one or two null caracters
----------------
I don't think you need to say that there must be one or two trailing null characters.

================
Comment at: llvm/test/tools/llvm-ar/bigarchive.test:3
+; RUN: echo "test big archive" > %t.txt
+; RUN: llvm-ar cr --format=bigarchive %t.ar %t.txt 2>&1
+; RUN: grep -q bigaf %t.ar
----------------

================
Comment at: llvm/test/tools/llvm-ar/bigarchive.test:4
+; RUN: llvm-ar cr --format=bigarchive %t.ar %t.txt 2>&1
+; RUN: grep -q bigaf %t.ar
----------------
As noted below, `grep` is deprecated in lit tests. Use `FileCheck` instead.

================
Comment at: llvm/test/tools/llvm-ar/default-aix-host.test:2
+; REQUIRES: system-aix
+;; Test llvm-ar use --format=bigarchive by default
+; RUN: echo "test big archive" > %t.txt
----------------

================
Comment at: llvm/test/tools/llvm-ar/default-aix-host.test:4
+; RUN: echo "test big archive" > %t.txt
+; RUN: llvm-ar cr %t.ar %t.txt 2>&1
+; RUN: grep -q bigaf %t.ar 
----------------
You're not using the stderr or stdout.

================
Comment at: llvm/test/tools/llvm-ar/default-aix-host.test:5
+; RUN: llvm-ar cr %t.ar %t.txt 2>&1
+; RUN: grep -q bigaf %t.ar 
----------------
As noted below, `grep` is deprecated in tests. Use FileCheck instead.

================
Comment at: llvm/test/tools/llvm-ar/default-xcoff.test:1
-; REQUIRES: system-aix
-;; Test llvm-ar does not support Big AIX archive write operation.
-
+;; Test llvm-ar use big archive format for the XCOFF object file by default. 
 ; RUN: yaml2obj %S/Inputs/xcoff.yaml -o %t.obj
----------------

================
Comment at: llvm/test/tools/llvm-ar/default-xcoff.test:4
 ; RUN: rm -f %t.ar
-; RUN: not llvm-ar cr %t.ar %t.obj 2>&1 | FileCheck %s
-; RUN: echo "test big archive" > %t.txt
-; RUN: not llvm-ar cr %t.ar %t.txt 2>&1 | FileCheck %s
-
-; CHECK: big archive writer operation on AIX not yet supported
+; RUN: llvm-ar cr %t.ar %t.obj 2>&1
+; RUN: grep -q bigaf %t.ar
----------------

================
Comment at: llvm/test/tools/llvm-ar/default-xcoff.test:5
+; RUN: llvm-ar cr %t.ar %t.obj 2>&1
+; RUN: grep -q bigaf %t.ar
----------------
`grep` shouldn't be used in llvm lit tests (see https://llvm.org/docs/TestingGuide.html#writing-new-regression-tests). You should use `FileCheck --input-file=%t.ar` instead, with a check pattern as typical.

================
Comment at: llvm/test/tools/llvm-ar/thin-to-full-archive.test:1
-# XFAIL: system-aix
+#UNSUPPORTED: system-aix
 ## Test thin archives do not siletly convert to full archives on write.
----------------
Might be worth a comment saying why it is unsupported here.

================
Comment at: llvm/tools/llvm-ar/llvm-ar.cpp:958-959
   case DARWIN:
     if (Thin)
       fail("only the gnu format has a thin mode");
     Kind = object::Archive::K_DARWIN;
----------------
Do you need this sort of check for the Big format too?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123949/new/

https://reviews.llvm.org/D123949