[PATCH] D100651: [AIX] Support of Big archive (read)

Tue Jul 13 08:44:55 PDT 2021

EGuesnet marked 14 inline comments as done.
EGuesnet added a comment.

> You should get some people who are more familiar with XCOFF/AIX to help review correctness. There has been a lot of development for this in other tools (e.g. llvm-readobj).

The IBM team that has sent PR about llvm-readobj is aware of this PR. I will recontact them.

I think I have respond to most of your comment. The main issue is the `static` variable. I need to know the current location to distinguish Fix length header and Object headers, and I have not found cleaner solution.
Note that it has been checked about multiple read: Rust loads libLLVM.so once, and can read multiple archive. It works, as the static variable is initialized correctly before being used. But it is sequential read.

================
Comment at: llvm/include/llvm/Object/Archive.h:371-373
+  // On Big Archive, total length is needed
+  // because end of file is member table and global symbol table.
+  uint32_t Length = 0;
----------------
jhenderson wrote:
> Do you really need this length? Why is it `uint32_t`?
Maximal offset (in fixed length header) is stored as 20 bytes as decimal digit. So, maximal length is approximately 10^10. It is the total length of the archive. So, it must be uint64-t, not uint32_t.

================
Comment at: llvm/include/llvm/Object/Archive.h:391
+  // so we need to memorize position.
+  static uint64_t CurrentLocation;
+
----------------
jhenderson wrote:
> EGuesnet wrote:
> > rupprecht wrote:
> > > Does this need to be static? e.g. this could break if any tool needed to read two separate big archives
> > As far as I know, static variable is needed. Location must be modified by Child and must be stored in Parent. As Child cannot modify Parent, it is the easiest way to deal with.
> > I agree it could break other tools that read two separate big archive, but I have tested this case: Rust reads lot of separate archives, and probably read them twice in some cases. And it works. The first version was broken due to static variables, but it is now correctly implemented.
> This is completely unthread safe - if you have a tool parsing two archives simultaneously, this will fail horribly. Do you really need it though? See my comments further down.
> 
> From reading the spec, the archive uses offsets everywhere and it should be simple to use the values stored in the member headers.
Main trouble is with first read, to distinguish Fixed length header and other one.
We must know the current location, and modify it during read. This cannot be done in Parent (because of const) or Header (not access when it is required). Static variable in archive is the least bad solution I have found.

================
Comment at: llvm/lib/Object/Archive.cpp:648

 Expected<Archive::Child> Archive::Child::getNext() const {
   size_t SpaceToSkip = Data.size();
----------------
jhenderson wrote:
> It seems to me like the logic you've added to this function is a little odd. I think you may be trying too hard to match the existing logic, when it doesn't really make sense. In traditional UNIX ar format, each child is immediately followed by the next one (barring a possible even-alignment padding byte), stopping when you get to the end. My reading of the Big Archive spec is that the next child's offset is defined by the ar_nxtmem member of its header. You should just be using that and fl_lstmoff to identify if the current child is the last one or not. You can then check to see if the child goes past the buffer end, as per the existing check in this file.
> I think you may be trying too hard to match the existing logic
Right.

> My reading of the Big Archive spec is that the next child's offset is defined by the ar_nxtmem member of its header. You should just be using that and fl_lstmoff to identify if the current child is the last one or not. You can then check to see if the child goes past the buffer end, as per the existing check in this file.

But using offset needs to know the current location. You say in another comment this his highly not thread-safe and might be avoid. 
Moreover, if I use fl_lstmoff  instead of Length, I must have an access to Fixed length header. It is currently read and forbidden as length of object in archive in the only information we need.

================
Comment at: llvm/lib/Object/Archive.cpp:1016

 Expected<Archive::Child> Archive::Symbol::getMember() const {
   const char *Buf = Parent->getSymbolTable().begin();
----------------
jhenderson wrote:
> Do you actually need to change this method in the first instance? You don't have any testing for symbol printing (and I don't think you should at this point...).
I have removed code related to symbol read.

================
Comment at: llvm/lib/Object/ArchiveWriter.cpp:141
+  case object::Archive::K_XCOFF:
+      llvm_unreachable("Not handled yet");
   case object::Archive::K_COFF:
----------------
jhenderson wrote:
> Make sure to clang-format all your modified code.
> 
> I'd actually use `report_fatal_error()` rather than `llvm_unreachable()`, if the code could be reached by a tool receiving a file of this wrong kind, with the corresponding options. If it is genuinely unreachable of course, there is no need to change.
> 
> Same below.
It can be reached by external tools (not llvm-ar), so I modifie it to use `report_fatal_error()`.

================
Comment at: llvm/test/Object/archive-big-read.test:1
+# Test reading an archive created by AIX ar (Big Archive)
+RUN: env TZ=GMT llvm-ar tv %p/Inputs/Big.a | FileCheck %s -strict-whitespace
----------------
jhenderson wrote:
> Two related nits:
> 1) Although strictly not necessary, I find it slightly disorientating when seeing a lit test without comment markers for RUN and CHECK lines. I'd add them.
> 2) For true comments, I encourage in newer tests to use `##` to disitnguish from CHECK/RUN lines.
> 
> Also, comments should end with a full stop ('.') - applies below too.
I think it is OK now.
I have first done the test inspiring from other one, and they do not use theses convention and flag.

================
Comment at: llvm/test/Object/archive-big-read.test:3
+
+Test reading an archive created by AIX ar (Big Archive)
+RUN: env TZ=GMT llvm-ar tv Inputs/Big.a | FileCheck %s -strict-whitespace
----------------
rupprecht wrote:
> nit: it isn't technically required, but this should start with a comment (i.e. `# Test reading ...`) to separate it more clearly from run/check commands
Done. See `archive-big-extract.test`.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100651/new/

https://reviews.llvm.org/D100651