[PATCH] D142660: [AIX] supporting -X options for llvm-ranlib in AIX OS

Fri Jul 28 00:34:43 PDT 2023

jhenderson added a comment.

In D142660#4538936 <https://reviews.llvm.org/D142660#4538936>, @DiggerLin wrote:

>> As an alternative (but I think adds unnecessary complexity, due to needing an extra variable), you could have both tools read the environment variable into a string variable, then, if the -X option is present, overwrite that variable, and finally feed that string into the parsing code that converts into a BitMode value. If the string is invalid, the parsing code could report an error along the lines of "invalid OBJECT_MODE or -X option value".
>
> if I do not think it is better than my current implement, If I implement as your suggestion, I need another variable to recoded the where the string come from(from OBJECT_MODE  or -X option value). otherwise when the invalid value of string , how can I report it is an invalid OBJECT_MODE"  or "invalid -X option value"

I'm not convinced you need to distinguish the two. If a user hasn't specified the -X option, then they'll know it's the OBJECT_MODE environment variable. Even if they have both, it should be obvious to a quick glance of the full error message what the wrong value is (because you'd include it in the message) and you could then it won't take long to find which of the two is wrong (even if they don't know that -X trumps OBJET_MODE). That being said, I still prefer the parse OBJECT_MODE first, report an error, then later read the -X option and check for an error there.

================
Comment at: clang/test/lit.cfg.py:391
+if 'system-aix' in config.available_features:
+   config.environment['OBJECT_MODE'] = '32_64'

----------------
MaskRay wrote:
> Recent Python style prefers double quotes
This also applies to the line above. See discussions on Discourse about using "black" to format python code (it's similar to clang-format, but for python).

================
Comment at: llvm/test/tools/llvm-ranlib/aix-X-option.test:17-18
+## Test the OBJECT_MODE environment variable when adding symbol table.
+# RUN: unset OBJECT_MODE
+# RUN: llvm-ranlib t_X32.a
+# RUN: llvm-nm --print-armap t_X32.a 2>&1 | FileCheck --check-prefixes=GLOB32 --implicit-check-not="in t64" %s
----------------
DiggerLin wrote:
> jhenderson wrote:
> > DiggerLin wrote:
> > > jhenderson wrote:
> > > > Assuming the unsetting is intended to be just for the llvm-ranlib line, I believe the preferred approach is `env -u OBJECT_MODE llvm-ranlib ...`. That way, you don't impact the behaviour in subsequent lines.
> > > the command `env` of AIX OS do not support -u.  and there is no mapping  option of `env -u`
> > > 
> > > https://www.ibm.com/docs/en/aix/7.2?topic=e-env-command
> > I'm 90% certain that `env` here is the built-in lit `env`, not a system `env` executable. `env -u` seems to be only rarely used in the test suite, but see llvm/utils/lit/tests/Inputs/shtest-env/env-u.txt for an example. I assume you can run this test locally?
> it run fail in AIX OS locally.
My apologies, it appears that you must be using the external shell or something (see lit differences internal and external shell). That might explain why `env -u` isn't used that much.

================
Comment at: llvm/tools/llvm-ar/llvm-ar.cpp:237
 static bool Verbose = false;              ///< 'v' modifier
-static bool Symtab = true;                ///< 's' modifier
+static WriteSymTabType Symtab = true;     ///< 's' modifier
 static bool Deterministic = true;         ///< 'D' and 'U' modifiers
----------------
DiggerLin wrote:
> jhenderson wrote:
> > DiggerLin wrote:
> > > jhenderson wrote:
> > > > DiggerLin wrote:
> > > > > jhenderson wrote:
> > > > > > DiggerLin wrote:
> > > > > > > jhenderson wrote:
> > > > > > > > Maybe I'm missing something, but I don't see why you need to make this a custom type. You already have the BitMode value that you read in from the environment/command-line, and so you can just use that in conjunction with the `Symtab` boolean to determine what symbol table to write, surely?
> > > > > > > the Symtab are use to specify whether the symbol table need to write(for non-AIX). and what the symbol table need to write for AIX OS.
> > > > > > > 
> > > > > > > there are function like
> > > > > > > 
> > > > > > > ```
> > > > > > >   writeArchiveToBuffer(ArrayRef<NewArchiveMember> NewMembers,
> > > > > > >                      WriteSymTabType WriteSymtab, object::Archive::Kind Kind,
> > > > > > >                      bool Deterministic, bool Thin)
> > > > > > > ```
> > > > > > > and 
> > > > > > > 
> > > > > > > ```
> > > > > > > Error writeArchive(StringRef ArcName, ArrayRef<NewArchiveMember> NewMembers,
> > > > > > >                    WriteSymTabType WriteSymtab, object::Archive::Kind Kind,
> > > > > > >                    bool Deterministic, bool Thin,
> > > > > > >                    std::unique_ptr<MemoryBuffer> OldArchiveBuf = nullptr);
> > > > > > > ```
> > > > > > > 
> > > > > > >  call the function as 
> > > > > > > 
> > > > > > > 
> > > > > > > ```
> > > > > > >         writeArchiveToBuffer(M.second.getMembers(),
> > > > > > >                              /*WriteSymtab=*/true,
> > > > > > >                              /*Kind=*/object::Archive::K_DARWIN,
> > > > > > >                              C.Deterministic,
> > > > > > >                              /*Thin=*/false); 
> > > > > > > ```
> > > > > > > 
> > > > > > > I do not want to change the calling parameter of the /*WriteSymtab=*/true, 
> > > > > > > 
> > > > > > > I introduced a new class WriteSymTabType which has following API
> > > > > > >     
> > > > > > >  
> > > > > > > ```
> > > > > > >  WriteSymTabType(bool PrintSym) { Value = PrintSym ? Both : None; }
> > > > > > >   void operator=(bool PrintSym) { Value = PrintSym ? Both : None; }
> > > > > > >   operator bool() { return Value != None; }
> > > > > > > ```
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > Having looked at this again, I really don't like the implicit conversion semantics of the new class, and I don't see why you can't just extend the `writeArchiveToBuffer` etc parameters to pass in a `BitMode` parameter.
> > > > > without implicit conversion, if I add a new parameter BitMode to writeArchiveToBuffer,  I also need to add the new parameter BitMode to function writeArchive() and writeArchiveToStream() and there is a lot of place(files) call the function writeArchive, I need to modify all the caller too. 
> > > > > I also need to add the new parameter BitMode to function writeArchive() and writeArchiveToStream() and there is a lot of place(files) call the function writeArchive, I need to modify all the caller too.
> > > > 
> > > > I don't think this is a good reason not to do this, given that it results in an (in my opinion) suboptimal interface. That being said, you could move the new parameter I'm suggesting to the end of the argument list and use a default value, if it helps reduce the impact.
> > > if I add a need BitMode enum
> > > {None,
> > >  Bit32,
> > >  Bit64,
> > >  Bit32_64
> > > };
> > > 
> > > Do you think we still need the parameter "bool WriteSymtab" , I think we can use the BitMode=None to imply WriteSymtab=false, using  BitMode!=None imply WriteSymtab=true. @jhenderson 
> > Well the BitMode is Big Archive specific, and so isn't actually relevant for any other archive format. I don't think it's clear to a maintainer who is unfamiliar with Big Archive that "BitMode = None" implies 2don't write a symbol table". Really, I think the whole llvm-ar.cpp code could do with some major refactoring to introduce more object orientation, so that the details of different formats can be hidden from each other, but that's outside the scope of this patch (in this specific case, I'd have a `SymtabWriter` class with `NoSymtabWriter`, `BigArchiveSymtabWriter32`, and so on subclasses - these would then have methods that did the appropriate writing of symbol tables).
> > 
> > I do see the appeal of a single enum parameter in the meantime. I would suggest it should be called `SymtabWritingMode` and have values `NoSymtab`, `NormalSymtab`, `BigArchive64Bit`, `BigArchive32Bit`, and `BigArchiveBoth`. You could optionally drop one of the `BigArchive*` values, and treat `NormalSymtab` as that value for BigArchive (as long as "Normal" is clear as to its meaning to someone familiar with BigArchive). All existing `WriteSymtab` `false` values would be replaced with `NoSymtab` and `true` with `NormalSymtab` (except for BigArchive cases).
> > Having looked at this again, I really don't like the implicit conversion semantics of the new class, and I don't see why you can't just extend the writeArchiveToBuffer etc parameters to pass in a BitMode parameter.
> 
>  I am not sure why you do not like the implicit of the conversion semantics of the new class, can you elaborate the disadvantage in the case ?
>   
Implicit conversions from primitive types to more complex types are generally, in my experience, a maintenance nightmare. Here are some specific disadvantages to this case:
  - From the caller's perspective, it isn't obvious that a complex type of `WriteSymtabType`, as declared in the function's signature that they are probably looking at, can be constructed from a boolean.
  - It's not even particularly obvious that implicit conversion is possible by a casual glance at the class definition (since implicit construction is defined by the lack of `explicit` and it's not always obvious if you are only briefly looking that `explicit` is missing).
  - `WriteSymtabType` is essentially an enum. Without looking up the details of how the class works, it's not obvious to the caller what a `true` or `false` value maps to. You wouldn't typically construct an enum from integer literal values for this sort of reason - you'd spell out the specific enum value that you intend.
  - You are trying to add support for a new file format to existing code. This is tainting how you do things. If you wrote this code from scratch, you wouldn't use the implicit conversion stuff - you'd use an enum or even a proper class hierarchy. It seems to me like one of the main reasons you're using the complex type is because you don't want to update the various call sites, which, put crudely, is a bit lazy. You should always aim to leave things in at least a no-worse state than when you found it, preferably better.

I could probably come up with other issues if I thought about it more, but really I keep coming back to "if I completely rewrote this code (without object orientation), what would I do?" and the enum wins every time.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142660/new/

https://reviews.llvm.org/D142660