[PATCH] D111889: [AIX] Support of Big archive (read)

James Henderson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Nov 11 08:03:15 PST 2021


jhenderson added a comment.

I've made a start on continuing this review. Will try to get through the rest of the file tomorrow.



================
Comment at: llvm/include/llvm/Object/Archive.h:62
 
-  // This returns the size of the private struct ArMemHdrType
-  uint64_t getSizeOf() const { return sizeof(ArMemHdrType); }
+  // Returns the size in bytes of the format-defined header of derived class.
+  virtual uint64_t getSizeOf() const { return 0; }
----------------
Conceptually, the kind of archive is important, not the class it is represented in.


================
Comment at: llvm/include/llvm/Object/Archive.h:63
+  // Returns the size in bytes of the format-defined header of derived class.
+  virtual uint64_t getSizeOf() const { return 0; }
+
----------------
It seems to me like this should be a pure virtual function, rather than returning 0 here - another archive kind in the future should explicitly say what the size of its member headers are (even if it is zero). By providing a default implementation, there's a real risk some future implementation won't override this, and will then have problems.


================
Comment at: llvm/include/llvm/Object/Archive.h:65
+
+  Archive const *Parent;
+  AbstractArchiveMemberHeader(const Archive *Parent) : Parent(Parent){};
----------------
Please move data to one place, rather than sandwiched between class methods.

Also, does this need to be public? It wasn't before...


================
Comment at: llvm/include/llvm/Object/Archive.h:66
+  Archive const *Parent;
+  AbstractArchiveMemberHeader(const Archive *Parent) : Parent(Parent){};
+};
----------------
Make this `protected`, and move it to near the top of the class, by the destructor, where constructors usually live.


================
Comment at: llvm/include/llvm/Object/Archive.h:104
+
+class BigArchiveMemberHeader : public AbstractArchiveMemberHeader {
+public:
----------------
Perhaps add a comment for this class indicating what this archive format is used for.


================
Comment at: llvm/include/llvm/Object/Archive.h:108
+                         uint64_t Size, Error *Err);
+  std::unique_ptr<AbstractArchiveMemberHeader> clone() const override {
+    return std::unique_ptr<AbstractArchiveMemberHeader>(
----------------
Add a blank line before this function.


================
Comment at: llvm/include/llvm/Object/Archive.h:125
+  Expected<uint64_t> getNextOffset() const;
+  static bool classof(AbstractArchiveMemberHeader const *Header);
+
----------------
Separate unrelated functions with blank lines.


================
Comment at: llvm/include/llvm/Object/Archive.h:379
+  /// Fixed-Length Header.
+  struct BigArFixLenHdrType {
+    char Magic[ArchiveMagicLen]; ///< Big archive magic string.
----------------
I think the earlier name is fine for this. I'd call it "FixLenHdrType" or ideally just "FixLenHdr" in fact ("Type" on the end of a class name is ugly, and doesn't really add anything. No need to prefix "BigAr", since you're already nested in `BigArchive` at this point.


================
Comment at: llvm/include/llvm/Object/Archive.h:391-392
+  const BigArFixLenHdrType *ArFixLenHdr;
+  uint64_t FirstMemberOffset = 0;
+  uint64_t LastMemberOffset = 0;
+
----------------
Let's make these consistent with the getters: `FirstChildOffset` and `LastChildOffset`.


================
Comment at: llvm/include/llvm/Object/Archive.h:35
 
+constexpr int ArchiveMaigcLen = 8;
+
----------------
DiggerLin wrote:
> jhenderson wrote:
> > I'd like to get rid of this constant, since it's not guaranteed for all conceptual archive types. Also there's a typo in it: `ArchiveMagicLen`.
> > 
> > See my below comments for how I'd get rid of it.
> I think define a "ArchiveMaigcLen = 8" is mandatory here.
>  if you look at the 
> 
> https://llvm.org/doxygen/Archive_8cpp_source.html 
> line 1000
> 
> 
> >  // Returns true if archive file contains no member file.
> >  bool Archive::isEmpty() const { return Data.getBufferSize() == 8; }
> 
> it use 8 to hardcode the magic length. I think define a const expr is much better than hardcode.
> 
> 
Whilst I agree that the `isEmpty` function is not written well, I think there's a wider problem within the archive code where it assumes that the magic will always be the same size. It currently is, for all supported archive types, but that doesn't mean it'll always be the case. Better would be to have the magic stored within the archive, or at least the magic length (or have a function that switches based on archive kind and returns the [length of the] magic). This would be an unrelated refactor.


================
Comment at: llvm/include/llvm/Object/Archive.h:151
 
 class Archive : public Binary {
   virtual void anchor();
----------------
DiggerLin wrote:
> jhenderson wrote:
> > Rather than be a potential concrete type, I think it would be cleaner if `Archive` were an abstract type, with concrete implementations for Big and traditional (better name required) archives. With Archive being a concrete type, there's a risk of slicing when someone thinks they've got a traditional archive, but actually have a BigArchive.
> > 
> > This then allows some somewhat better (in my opinion) locations for certain things.
> if order to not modify a lot of tools code, we make Archive an abstract type. This patch is big enough. we can implement a derived class GNUArchive class derived from archive(as your suggestion in EGuesnet's patch)  too in another separate patch.
> 
> and move the code 
> 
> 
> 
> 
> >  mutable std::vector<std::unique_ptr<MemoryBuffer>> ThinBuffers;
> 
> >   std::vector<std::unique_ptr<MemoryBuffer>> takeThinBuffers() {
> >   return std::move(ThinBuffers);
> > }
> > 
> > 
> 
> 
> 
> from class  Archive to class GNUArchive the other separate patch.
If you don't want to do that change in this patch (which is fair enough), could you please do it in a separate precursor patch to this, please?

I'd probably not call the concrete class GNUArchive, since it will, at least for now, act as the concrete class used for a number of closely-related archive formats, not all of which are GNU-related. UNIXArchive might be more accurate. In the future, we might end up splitting that further into additional types, e.g. COFFArchive, GNUArchive, BSDArchive etc).


================
Comment at: llvm/lib/BinaryFormat/Magic.cpp:95
+    break;
+
   case '\177':
----------------
DiggerLin wrote:
> DiggerLin wrote:
> > Esme wrote:
> > > With this recognition added, other tools' supports for the big archive are enabled, e.g. `llvm-objdump -a`, do we need to add tests for this?
> > I  tested  llvm-objdump for  archive file in test case of  the patch 
> > https://reviews.llvm.org/D112097 
> > 
> > llvm/test/tools/llvm-objdump/XCOFF/export_sym_list_ar.test
> I added a new test  for the llvm-obj -a for big archive
I think it would be good to have at least one test for each of the binutils with a BigArchive input. It's probably sufficient to add that in a separate patch - I'd prefer we be able to generate these input files from a combination of yaml2obj and llvm-ar, so we will need the writing patch to do so.


================
Comment at: llvm/lib/Object/Archive.cpp:20
 #include "llvm/Object/Error.h"
+#include "llvm/Support/Casting.h"
 #include "llvm/Support/Chrono.h"
----------------
Does the code work without this? We tend not to add includes unless they're actually needed to make the code work.


================
Comment at: llvm/lib/Object/Archive.cpp:35
 #include <memory>
+#include <stddef.h>
 #include <string>
----------------
I'd be surprised if this header is actually needed, especially the C-style one.


================
Comment at: llvm/lib/Object/Archive.cpp:111
+
+  if (Size < sizeof(ArMemHdrType)) {
+    if (Err) {
----------------
If I'm reading this rightly, this entire block could be moved into common code with the regular archive vareity. If it's not possible to put it into the base class constructor, put it in a helper function or method instead. 


================
Comment at: llvm/lib/Object/Archive.cpp:125
+  }
+
+}
----------------
Don't have trailing blank lines at ends of functions.


================
Comment at: llvm/lib/Object/Archive.cpp:161
+Expected<StringRef> BigArchiveMemberHeader::getRawName() const {
+  StringRef::size_type NameSize = strtol(ArMemHdr->NameLen, NULL, 10);
+  return StringRef(ArMemHdr->Name, NameSize);
----------------
Also consider adding comments to name the nullptr and 10 values, e.g. `strtol(ArMemHdr->NameLen, /*endptr=*/nullptr, /*base=*/10);`

The regular archive kind has various safety checks to make sure the number read makes sense. For example, it checks to make sure there's actually a number in the field. We also need to show that the name's start and end are within the archive buffer, otherwise we'll get crashes/reading past the end of the file etc.


================
Comment at: llvm/lib/Object/Archive.cpp:268-271
+  // The raw name itself can be invalid.
+  Expected<StringRef> NameOrErr = getRawName();
+  if (!NameOrErr)
+    return NameOrErr.takeError();
----------------
At the moment, this should be a `cantFail`, since no error can actually get here, but see also my above comment re. validation of the raw name.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111889/new/

https://reviews.llvm.org/D111889



More information about the llvm-commits mailing list