[PATCH] D142660: [AIX] supporting -X options for llvm-ranlib in AIX OS

James Henderson via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Jul 26 02:09:28 PDT 2023


jhenderson added inline comments.


================
Comment at: llvm/test/tools/llvm-ranlib/aix-X-option.test:68
+
+## Test the invalid -X option and OBJECT_MODE enviornment var.
+# RUN: not env OBJECT_MODE= llvm-ranlib t_X32.a 2>&1 | FileCheck --implicit-check-not="error:"  --check-prefixes=INVALID-OBJECT-MODE %s
----------------
Actually, you shouldn't have added "the" here. The grammatical rules are a little tricky to explain, so I won't bother, but you should actually say "Test invalid -X options and OBJECT_MODE environment variables."


================
Comment at: llvm/test/tools/llvm-ranlib/aix-X-option.test:17-18
+## Test the OBJECT_MODE environment variable when adding symbol table.
+# RUN: unset OBJECT_MODE
+# RUN: llvm-ranlib t_X32.a
+# RUN: llvm-nm --print-armap t_X32.a 2>&1 | FileCheck --check-prefixes=GLOB32 --implicit-check-not="in t64" %s
----------------
DiggerLin wrote:
> jhenderson wrote:
> > Assuming the unsetting is intended to be just for the llvm-ranlib line, I believe the preferred approach is `env -u OBJECT_MODE llvm-ranlib ...`. That way, you don't impact the behaviour in subsequent lines.
> the command `env` of AIX OS do not support -u.  and there is no mapping  option of `env -u`
> 
> https://www.ibm.com/docs/en/aix/7.2?topic=e-env-command
I'm 90% certain that `env` here is the built-in lit `env`, not a system `env` executable. `env -u` seems to be only rarely used in the test suite, but see llvm/utils/lit/tests/Inputs/shtest-env/env-u.txt for an example. I assume you can run this test locally?


================
Comment at: llvm/tools/llvm-ar/llvm-ar.cpp:1442
+
+            // -X option in ranlib do not accept "any"
+            if (BitMode == BitModeTy::Unknown || BitMode == BitModeTy::Any)
----------------
DiggerLin wrote:
> jhenderson wrote:
> > DiggerLin wrote:
> > > jhenderson wrote:
> > > > I know that AIX ranlib doesn't accept "any" whereas I believe AIX ar does. I wonder though what the benefit is of preventing llvm-ranlib from accepting "any"? Dropping that special case would simplify the code.
> > > agree with you. but we discussed about whether to accept `any` in our internal , we decide to keep all the behavior  as AIX `ranlib`
> > > we decide to keep all the behavior as AIX ranlib
> > 
> > We aren't in your internal company here. This is the open source community, therefore you need to be able to justify your decisions in the open source conversation. What reason is there to keep rejecting this in llvm-ranlib? Perhaps worth asking yourself is "if we could control it, why would we keep that behaviour in AIX ranlib?".
> according to 
> https://www.ibm.com/docs/en/aix/7.1?topic=ar-command
> 
> -X mode , there is `32`, `64`, `32_64`, `d64`, any mode
> 
> d64
> Examines discontinued 64-bit XCOFF files (magic number == U803XTOCMAGIC).
> 
> we do not support `d64`in llvm(since it is discontinued), but we keep `any`  option in llvm-ar in case of we want to use llvm-ar to replace AIX `ar`  in some AIX shell script which has option `any` for ar (`any = 32_64 + d64`),  we do no want to modify the option from `any` to `32_64` for AIX shell script, so we keep the `any` option for llvm-ar.
> 
> for AIX `ranlib`, https://www.ibm.com/docs/en/aix/7.2?topic=r-ranlib-command . it only support 32,64,32_64, It do not support `d64`, so there is no `any` option for AIX `ranlib`, we do not need to add a additional `any` for llvm-ranlib 
> we do not need to add a additional any for llvm-ranlib

As noted earlier, adding an `any` value would align llvm-ranlib and llvm-ar's command-line options, allowing the code to be simpler. It doesn't break existing users by permitting it either. You have explained why you aren't accepting it, but not actually what the benefit is of that approach. What is the benefit of NOT accepting `any` in llvm-ranlib?


================
Comment at: llvm/tools/llvm-ar/llvm-ar.cpp:1463
 
+  if (object::Archive::getDefaultKindForHost() == object::Archive::K_AIXBIG) {
+    // If not specify -X option, get BitMode from enviorment variable
----------------
DiggerLin wrote:
> jhenderson wrote:
> > DiggerLin wrote:
> > > jhenderson wrote:
> > > > Is there a particular reason that this is after the command-line option parsing for llvm-ranlib, but before it in llvm-ar? If this were before the option parsing, you wouldn't need the `HasAixXOption` variable.
> > > in AIX OS  `ranlib` has behavior as
> > > 
> > > 
> > > ```
> > > -bash-5.0$ env OBJECT_MODE=31 ranlib tmpk.a
> > > 0654-603 The OBJECT_MODE environment variable has an invalid setting.
> > >         OBJECT_MODE must be 32, 64, or 32_64.
> > > -bash-5.0$ env OBJECT_MODE=31 ranlib -X32  tmpk.a
> > > -bash-5.0$
> > > ```  
> > > 
> > > Given invalid env OBJECT_MODE , if there is no -X option in the ranlib command, it will output as 
> > > 
> > > ```
> > > 0654-603 The OBJECT_MODE environment variable has an invalid setting.
> > >         OBJECT_MODE must be 32, 64, or 32_64.
> > > ```
> > > 
> > > Given invalid env OBJECT_MODE , and there is -X option in the ranlib command,  it do not care about the invalid env OBJECT_MODE,  So I has to parse the -X option before the getBitMode(getenv("OBJECT_MODE"))
> > > 
> > So with AIX ar, does an invalid OBJECT_MODE environment variable get reported if the -X option is specified?
> > 
> > In my opinion, I think it is very unlikely there will be any real users out there with an invalid OBJECT_MODE environment variable, because other tools will reject it, even if ranlib doesn't. Even if there are, they should be able to easily fix their variable, if they start getting an error message after switching to llvm-ranlib. I'm therefore thinking that there isn't a need to mirror this logic in llvm-ranlib, if it would make the code simpler (which it would).
> > So with AIX ar, does an invalid OBJECT_MODE environment variable get reported if the -X option is specified?
> 
> it do no report invalid OBJECT_MODE environment if the -X option is specified.
> 
> 
> ```
> bash-5.0$ export OBJECT_MODE=31
> bash-5.0$ ar  q tmpp.acheck.o
> ar: 0707-133 The OBJECT_MODE environment variable has an invalid setting.
>         OBJECT_MODE must be 32, 64, 32_64, d64, or any.
> bash-5.0$ ar -X32 q tmpp.acheck.o
> bash-5.0$
> ```
> 
> 
> 
> > In my opinion, I think it is very unlikely there will be any real users out there with an invalid OBJECT_MODE environment variable, because other tools will reject it, even if ranlib doesn't. Even if there are, they should be able to easily fix their variable, if they start getting an error message after switching to llvm-ranlib. I'm therefore thinking that there isn't a need to mirror this logic in llvm-ranlib, if it would make the code simpler (which it would).
> 
> since -X option will overwrite the env variable OBJECT_MODE, 
> in the source code , we should check the -X option of the command line first, if there is no  -X option ,  we will check the env variable OBJECT_MODE. the logic of code is correct in the patch.  we should not always  getenv("OBJECT_MODE")  before checking the -X option.
> 
> since -X option will overwrite the env variable OBJECT_MODE, in the source code , we should check the -X option of the command line first, if there is no -X option , we will check the env variable OBJECT_MODE. the logic of code is correct in the patch. we should not always getenv("OBJECT_MODE") before checking the -X option.

I don't agree. It is normal to validate all of a a user's input options (including environment variables that are a proxy for command-line options) even if they don't end up getting used. Consider a similar, admittedly somewhat arbitrary case. If you had one command called e.g. `--doTheThing` whose sole purpose was to ensure that some `if` block was run, and then another option `--doTheThingThisWay={methodA|methodB}` which controlled what happens within that `if` block, it would be perfectly normal and acceptable to check the value passed to `--doTheThingThisWay` even without `--doTheThing` being specified. The code would typically look like this pseudo code:
```
Config processCommandLine(Args args) {
  Config cfg;
  if ("doTheThing" in args) {
    cfg.doTheThing = true;
  }
  if ("doTheThingThisWay" in args) {
    switch(args["doTheThingThisWay"].value) {
       case "methodA":
        cfg.doTheThingThisWay = DoTheThingThisWayKind::MethodA;
        break;
       case "methodB":
        cfg.doTheThingThisWay = DoTheThingThisWayKind::MethodB;
        break;
      default:
        reportError("invalid --doTheThingThisWay value");
    }
  }
}
```
The OBJECT_MODE environment variable is a proxy for the -X option. Therefore, it should be parsed and checked in the same manner as the -X option even if it isn't used. This brings consistency between the two tools, which is good. Incosistency is bad: it increases code complexity and potentially can confuse users. What is the benefit to making the two tools inconsistent?


================
Comment at: llvm/tools/llvm-ar/llvm-ar.cpp:237
 static bool Verbose = false;              ///< 'v' modifier
-static bool Symtab = true;                ///< 's' modifier
+static WriteSymTabType Symtab = true;     ///< 's' modifier
 static bool Deterministic = true;         ///< 'D' and 'U' modifiers
----------------
DiggerLin wrote:
> jhenderson wrote:
> > DiggerLin wrote:
> > > jhenderson wrote:
> > > > DiggerLin wrote:
> > > > > jhenderson wrote:
> > > > > > Maybe I'm missing something, but I don't see why you need to make this a custom type. You already have the BitMode value that you read in from the environment/command-line, and so you can just use that in conjunction with the `Symtab` boolean to determine what symbol table to write, surely?
> > > > > the Symtab are use to specify whether the symbol table need to write(for non-AIX). and what the symbol table need to write for AIX OS.
> > > > > 
> > > > > there are function like
> > > > > 
> > > > > ```
> > > > >   writeArchiveToBuffer(ArrayRef<NewArchiveMember> NewMembers,
> > > > >                      WriteSymTabType WriteSymtab, object::Archive::Kind Kind,
> > > > >                      bool Deterministic, bool Thin)
> > > > > ```
> > > > > and 
> > > > > 
> > > > > ```
> > > > > Error writeArchive(StringRef ArcName, ArrayRef<NewArchiveMember> NewMembers,
> > > > >                    WriteSymTabType WriteSymtab, object::Archive::Kind Kind,
> > > > >                    bool Deterministic, bool Thin,
> > > > >                    std::unique_ptr<MemoryBuffer> OldArchiveBuf = nullptr);
> > > > > ```
> > > > > 
> > > > >  call the function as 
> > > > > 
> > > > > 
> > > > > ```
> > > > >         writeArchiveToBuffer(M.second.getMembers(),
> > > > >                              /*WriteSymtab=*/true,
> > > > >                              /*Kind=*/object::Archive::K_DARWIN,
> > > > >                              C.Deterministic,
> > > > >                              /*Thin=*/false); 
> > > > > ```
> > > > > 
> > > > > I do not want to change the calling parameter of the /*WriteSymtab=*/true, 
> > > > > 
> > > > > I introduced a new class WriteSymTabType which has following API
> > > > >     
> > > > >  
> > > > > ```
> > > > >  WriteSymTabType(bool PrintSym) { Value = PrintSym ? Both : None; }
> > > > >   void operator=(bool PrintSym) { Value = PrintSym ? Both : None; }
> > > > >   operator bool() { return Value != None; }
> > > > > ```
> > > > > 
> > > > > 
> > > > > 
> > > > Having looked at this again, I really don't like the implicit conversion semantics of the new class, and I don't see why you can't just extend the `writeArchiveToBuffer` etc parameters to pass in a `BitMode` parameter.
> > > without implicit conversion, if I add a new parameter BitMode to writeArchiveToBuffer,  I also need to add the new parameter BitMode to function writeArchive() and writeArchiveToStream() and there is a lot of place(files) call the function writeArchive, I need to modify all the caller too. 
> > > I also need to add the new parameter BitMode to function writeArchive() and writeArchiveToStream() and there is a lot of place(files) call the function writeArchive, I need to modify all the caller too.
> > 
> > I don't think this is a good reason not to do this, given that it results in an (in my opinion) suboptimal interface. That being said, you could move the new parameter I'm suggesting to the end of the argument list and use a default value, if it helps reduce the impact.
> if I add a need BitMode enum
> {None,
>  Bit32,
>  Bit64,
>  Bit32_64
> };
> 
> Do you think we still need the parameter "bool WriteSymtab" , I think we can use the BitMode=None to imply WriteSymtab=false, using  BitMode!=None imply WriteSymtab=true. @jhenderson 
Well the BitMode is Big Archive specific, and so isn't actually relevant for any other archive format. I don't think it's clear to a maintainer who is unfamiliar with Big Archive that "BitMode = None" implies 2don't write a symbol table". Really, I think the whole llvm-ar.cpp code could do with some major refactoring to introduce more object orientation, so that the details of different formats can be hidden from each other, but that's outside the scope of this patch (in this specific case, I'd have a `SymtabWriter` class with `NoSymtabWriter`, `BigArchiveSymtabWriter32`, and so on subclasses - these would then have methods that did the appropriate writing of symbol tables).

I do see the appeal of a single enum parameter in the meantime. I would suggest it should be called `SymtabWritingMode` and have values `NoSymtab`, `NormalSymtab`, `BigArchive64Bit`, `BigArchive32Bit`, and `BigArchiveBoth`. You could optionally drop one of the `BigArchive*` values, and treat `NormalSymtab` as that value for BigArchive (as long as "Normal" is clear as to its meaning to someone familiar with BigArchive). All existing `WriteSymtab` `false` values would be replaced with `NoSymtab` and `true` with `NormalSymtab` (except for BigArchive cases).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142660/new/

https://reviews.llvm.org/D142660



More information about the cfe-commits mailing list