[PATCH] D100651: [AIX] Support of Big archive (read)

Wed Jul 14 03:39:17 PDT 2021

jhenderson added a comment.

> I need to know the current location to distinguish Fix length header and Object headers, and I have not found cleaner solution.

I still don't understand why you need to know it to make this distinction. The two header types are completely different things as far as I understand the spec, used in different contexts. Could you read the Fixed Length header as the first part of reading the archive, and the object headers when reading individual children, then store them as members of the respective classes?

> Note that it has been checked about multiple read: Rust loads libLLVM.so once, and can read multiple archive. It works, as the static variable is initialized correctly before being used. But it is sequential read.

Just because this works in one single case does not make this an appropriate design - in fact you haven't even tested the case where the issue is, as I bet that the thing you ran read the archives sequentially, not in parallel. What if I wrote a tool that wanted to read multiple Big-format archives in parallel, with one thread per archive? What would happen to the static variable on each individual thread?

Pseudo-code example reading algorithm that doesn't have the issues mentioned:

  class Archive {
      FixedLengthHeader Header;
      std::vector<Child> Children;
      ArrayRef<uint8_t> Data;

      Archive(ArrayRef<uint8_t> Data) : Data(Data) {
        Header = reinterpret_cast<FixedLengthHeader> (Data.data());
        uint64_t Offset = Header->FirstChildOffset;
        // Check if archive is empty.
        Child C(Offset, this);
        while (C != nullptr) {
          Children.push_back(C);
          C = C.nextChild();
        }
      }
  };

  class Child {
      ObjectHeader Header;
      uint8_t *Payload;
      Archive &Parent;
      Child(uint64_t Offset, Archive &Archive) {
        // Check Archive data is large enough for offset + header size + payload size.
        Parent = Archive;
        Header = reinterpret_cast<ObjectHeader>(Parent->Data.data() + Offset);
        Payload = Parent->Data.data() + Offset + sizeof(Header);
      }
      Child *nextChild() {
        if (Header->nextOffset == 0)
          return nullptr;
        return Child(Header->nextOffset, Parent);
      }
  };

================
Comment at: llvm/lib/Object/Archive.cpp:648

 Expected<Archive::Child> Archive::Child::getNext() const {
   size_t SpaceToSkip = Data.size();
----------------
EGuesnet wrote:
> jhenderson wrote:
> > It seems to me like the logic you've added to this function is a little odd. I think you may be trying too hard to match the existing logic, when it doesn't really make sense. In traditional UNIX ar format, each child is immediately followed by the next one (barring a possible even-alignment padding byte), stopping when you get to the end. My reading of the Big Archive spec is that the next child's offset is defined by the ar_nxtmem member of its header. You should just be using that and fl_lstmoff to identify if the current child is the last one or not. You can then check to see if the child goes past the buffer end, as per the existing check in this file.
> > I think you may be trying too hard to match the existing logic
> Right.
> 
> > My reading of the Big Archive spec is that the next child's offset is defined by the ar_nxtmem member of its header. You should just be using that and fl_lstmoff to identify if the current child is the last one or not. You can then check to see if the child goes past the buffer end, as per the existing check in this file.
> 
> But using offset needs to know the current location. You say in another comment this his highly not thread-safe and might be avoid. 
> Moreover, if I use fl_lstmoff  instead of Length, I must have an access to Fixed length header. It is currently read and forbidden as length of object in archive in the only information we need.
> It is currently read and forbidden as length of object in archive in the only information we need.

I don't understand this comment at all. Do you mean forgotten, not forbidden? Perhaps you could change how things are read and stored?

================
Comment at: llvm/test/Object/archive-big-read.test:1
+# Test reading an archive created by AIX ar (Big Archive)
+RUN: env TZ=GMT llvm-ar tv %p/Inputs/Big.a | FileCheck %s -strict-whitespace
----------------
EGuesnet wrote:
> jhenderson wrote:
> > Two related nits:
> > 1) Although strictly not necessary, I find it slightly disorientating when seeing a lit test without comment markers for RUN and CHECK lines. I'd add them.
> > 2) For true comments, I encourage in newer tests to use `##` to disitnguish from CHECK/RUN lines.
> > 
> > Also, comments should end with a full stop ('.') - applies below too.
> I think it is OK now.
> I have first done the test inspiring from other one, and they do not use theses convention and flag.
Older tests use older styles. We have developed better styles in the meantime which are preferred for newer tests.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100651/new/

https://reviews.llvm.org/D100651