[PATCH] D111889: [AIX] Support of Big archive (read)

Tue Dec 14 03:42:43 PST 2021

jhenderson added inline comments.

================
Comment at: llvm/include/llvm/Object/Archive.h:35-39
+namespace archive {
+const char Magic[] = "!<arch>\n";
+const char ThinMagic[] = "!<thin>\n";
+const char BigMagic[] = "<bigaf>\n";
+} // namespace archive
----------------
Rather than putting these in a new archive namespace (and then not putting the rest of the archive code in that namespace...), I'd suggest renaming the variables, and leaving them in the object namespace directly. Suggested names would be "ArchiveMagic", "ThinArchiveMagic", and "BigArchiveMagic" (optionally using "Ar" instead of "Archive", if you want something more succinct).

================
Comment at: llvm/include/llvm/Object/Archive.h:81
+template <typename T>
+class CommonMemHeaderFieldAPI : public AbstractArchiveMemberHeader {
+public:
----------------
Thanks for the class restructuring. I hope you agree that it looks better not having the duplicated code.

I think the class name needs changing. Apart from anything else, the name says nothing about archives. I also don't think "FieldAPI" makes a huge amount of sense to me (I see what you're trying to do though). I have two possible alternatives:
# My preferred approach would be to rename `ArchiveMemberHeader` to `UnixArchiveMemberHeader`, and then use `ArchiveMemberHeader` for this class's name.
# If renaming the `ArchiveMemberHeader` touches too much code, I'd suggest renaming this class to `CommonArchiveMemberHeader`.

Abbreviating "Archive" to "Ar" and "Member" to "Mem" is okay, in either case.

Additionally, you haven't got it quite right: the `T` parameter here should be the current `T::ArMemHdrType` instead, with the `ArMemHdr` being passed into this class's constructor (and potentially the instance in the derived clas isn't needed). That'll avoid the need for a) the `friend` declarations (I believe), and b) the need to repeatedly do `const T &DerivedMemberHeader = static_cast<const T &>(*this);` in the get* methods.

================
Comment at: llvm/include/llvm/Object/Archive.h:115
+  Expected<const char *> getNextChildLoc() const override;
+  virtual Expected<bool> isThin() const override;

----------------
No need for `virtual` here. Just delete it.

================
Comment at: llvm/include/llvm/Object/Archive.h:414
+    char MemOffset[20];          ///< Offset to member table.
+    char GlobSymOffset[20];      ///< Offset to global symbol table.
+    char
----------------
Still need to clang-foramt some of this content, it looks like.

================
Comment at: llvm/lib/Object/Archive.cpp:65
+
+#define getFieldRawString(T) StringRef(T, sizeof(T)).rtrim(" ")
+
----------------
Continuing our discussion that has now moved too far from the relevant site. As these fields are char arrays, you should be able to do something like:
```
template <class T, std::size_t N>
StringRef getFieldRawString(const T (&Field)[N]) {
  return StringRef(Field, N).rtrim(" ");
}
```
This will populate the template parameter N with the char array size, using template auto-deduction, avoiding the need for the sizeof entirely. See how the `std::size` signature works in C++17 for an example of this (note that obviously we don't want to use `std::size` itself here).

================
Comment at: llvm/lib/Object/Archive.cpp:214-215
+
+  // If name length is odd, padding '\0' to even length, after that there is
+  // name terminator "\0x60\0x0a".
+  uint64_t NameLenWithPadding = (NameLen + 1) >> 1 << 1;
----------------
I've fixed some grammar issues, and also suggested a slight improvement to the name terminator, to use the actual ASCII representations, for ease of understanding.

================
Comment at: llvm/lib/Object/Archive.cpp:219
+      StringRef(ArMemHdr->Name, NameLenWithPadding + 2);
+  const char NameTerminator[3] = {0x60, 0x0a, 0x00};
+  if (!NameStringWithNameTerminator.endswith(NameTerminator)) {
----------------
Any reason this can't be `StringRef NameTerminator = "`\n`?

I'd also put it before the `NameStringWithNameTerminator`, and you can then use `NameTerminator.size()` to set the length for `NameStringWithNameTerminator`

================
Comment at: llvm/lib/Object/Archive.cpp:225-227
+        "name have not name terminator \"0X60\\0X0a\" for archive member "
+        "header at offset " +
+        Twine(Offset));
----------------
Could you not use `createMemberHeaderParseError` rather than calling `malformedError` directly here?

================
Comment at: llvm/lib/Object/Archive.cpp:424-426
+  // If Size is odd, add 1 to make it even.
+  const char *NextLoc =
+      reinterpret_cast<const char *>(ArMemHdr) + (((Size + 1) >> 1) << 1);
----------------
This padding is in a couple of different places now. I think it would be good to move it into a small helper function, called e.g. `makeEven`.

That being said, there are some functions elsewhere within LLVM that align things (see "alignTo"). Could you use that instead. here and in similar places?

================
Comment at: llvm/lib/Object/Archive.cpp:435-436
+Expected<const char *> BigArchiveMemberHeader::getNextChildLoc() const {
+  // If current child offset is equal to the last ar offset, current child is
+  // the last one.
+  if (getOffset() ==
----------------
I think this comment is superfluous: the code is fairly clear as to what it does.

================
Comment at: llvm/lib/Object/Archive.cpp:681-686
+    return strlen(ThinMagic);
+
+  if (Kind() == K_AIXBIG)
+    return strlen(BigMagic);
+
+  return strlen(Magic);
----------------
I believe you can use `sizeof` instead of `strlen` here, as the magic strings are char arrays, rather than pointers. Please double-check though.

================
Comment at: llvm/lib/Object/Archive.cpp:1163
+  if (RawOffset.getAsInteger(10, FirstChildOffset))
+    Err = malformedError("malformed aix big archive: first member offset \"" +
+                         RawOffset + "\" is not a number");
----------------
Here and below, should this be "AIX" in the error message?

================
Comment at: llvm/test/Object/archive-big-extract.test:3
+# RUN: rm -rf %t && mkdir -p %t/extracted/
+# RUN:  cd %t/extracted/  &&  llvm-ar x %p/Inputs/aix-big-archive.a
+# RUN: diff %t/extracted/empty.o %p/Inputs/aix-big-archive-member.o
----------------
jhenderson wrote:
> Get rid of the extra spaces, although actually I think you should do the cd on the same line as the directory creation.
Why hasn't this been addressed yet?

================
Comment at: llvm/test/Object/archive-big-extract.test:2
+## Test printing an archive created by AIX ar (Big Archive).
+RUN: llvm-ar p %p/Inputs/bigFormat.a evenlen | FileCheck %s --check-prefix=AIXBIG --implicit-check-not={{.}}
+AIXBIG: content_of_evenlen
----------------
DiggerLin wrote:
> DiggerLin wrote:
> > sfertile wrote:
> > > I think James is suggesting that the 'extract' in the test name is misplaced since we are not in fact extracting anything. Instead how about naming the test something along the lines of `big-archive-print.test`.
> > > 
> > > > I think we should add "x" option in the llvm/test/tools/llvm-ar/extract.test after implement write big archive.
> > > 
> > > It is nice to reuse the existing tests for the new format, but as you pointed out we can't until we implement writing big archive files first. If you add an extract test in this directory (or even as a second runstep in this file), with the binary file already being added in this test then we get coverage now, and don't have to wait until we implement big archive writing. 
> > if I added the -x to extract test the archive file Input/libtest.a (the archive file also is used in https://reviews.llvm.org/D112097 ) in the test case.  I will use "diff command" to diff the extracting object file with original object file. I think I have to add two original object files into directory llvm/test/Object/Input for compare . Adding a extra object file is not a good idea unless we need it. if you strong suggestion I add the -x option with Input/libtest.a. I will add it.
> I think I can use bigFormat.a for option -x , I will add -x option test. thanks
As noted already, this test, in its current form, isn't about extracting the archive contents, so it should be renamed, and comments/names updated accordingly, e.g. `archive-big-print.test` (note that the -p option is for printing a member, not extracting it). Adding a -x test case is a tangential thing, which can be added either as part of this test case, or separately, I don't mind.

================
Comment at: llvm/test/Object/archive-big-print.test:2-3
+## Test printing an archive created by AIX ar (Big Archive).
+RUN: llvm-ar p %p/Inputs/aix-big-archive.a evenlen | FileCheck %s --check-prefix=AIXBIG --implicit-check-not={{.}}
+AIXBIG: content_of_evenlen
----------------
jhenderson wrote:
> The comment markers aren't necessary yet, but in a future version, where you create the inputs on the fly via yaml2obj etc, you will need them.
> 
> No need for --check-prefix when there's only one FileCheck run in a file: use just the default instead.
Any reason this hasn't been addressed yet?

================
Comment at: llvm/test/Object/archive-big-read.test:1
+## Test reading an archive created by AIX ar (Big Archive).
+RUN: env TZ=GMT llvm-ar tv %p/Inputs/aix-big-archive.a | FileCheck %s --strict-whitespace --check-prefix=AIXBIG --implicit-check-not={{.}}
----------------
jhenderson wrote:
> In the future, the input will hopefully be generated on the fly rather than using a canned binary.
> 
> Also, add comment markers and don't use check-prefix, as above.
> Also, add comment markers and don't use check-prefix, as above.

This hasn't been addressed.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111889/new/

https://reviews.llvm.org/D111889