[llvm-dev] Binary utilities: switch command line parsing from llvm::cl to OptTable (byproduct: drop -long-option?)

Fri Jul 2 16:49:35 PDT 2021

[This topic of this subthread has shifted from binary utilities to generic library options]

> Sure, but that isn't the problem you were raising that I was answering
> to, let's not move the goal post here.
> 
> How do you solve this "namespacing" issue though? This was the sense of my
> question earlier in this thread: is OptTable providing me a solution to
> build library options that can be re-exported through tools command line
> interface? (bonus point if it manages some sort of namespacing).

No, OptTable options are not composable.

> The intent of cl::opt in libraries is that they can be exposed on the
> command line. It seems to me that this namespacing issue is quite intrinsic
> to the flat command line interface itself.

Yes.

> The way we work around this to avoid collision in mlir is through
> convention, making sure passes are prefixed according to their library (or
> dialects in MLIR).

Yes. llvm-readobj has a similar pattern: calling a function which
defines cl::opt local variables.

OptTable is for tools like clang, flang, lld where

* they want high configurability, e.g. -long-option for some options while --long-option for others.
* they don't want library cl::opt options.[1]
* they want to avoid cl::opt's loosing behavior (-t=d and -demangle=0 are two examples I raised; hey, -enable-new-pm=0:) it is convenient internally but externally we'd better use -no- instead)

[1]: As you may have implied, this
is alternatively solvable by reorganizing llvm/lib/* options. This is
a huge task and the solution isn't particular clear yet.
--riscv-no-aliases is an example that cl::opt in library code can affect
multiple tools: llc/llvm-mc/llvm-objdump (llvm-mc/llvm-objdump usage was
unintentional). Someone signing up for the work needs to be careful on
letting utilities call the relevant register functions.

My proposal is like: binary utilities should move in the direction of
clang/lld as well.

> Another thing we have in MLIR is cl::opt wrapped into external storage and
> custom parser, they aren't exposed in the global namespace.
> For example, a pass class will define a member:
> 
>   ::mlir::Pass::Option<uint64_t> fastMemoryCapacity{*this,
> "fast-mem-capacity", ::llvm::cl::desc("Set fast memory space capacity in
> KiB (default: unlimited)"),
> ::llvm::cl::init(std::numeric_limits<uint64_t>::max())};
> 
> Where mlir::Pass::Option is a inheriting from cl::opt here:
> https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Pass/PassOptions.h#L111
> 
> In `mlir-opt --help` it shows up as:
> 
>   Compiler passes to run
>     --pass-pipeline                                     -   A textual
> description of a pass pipeline to run
>     Passes:
>       --affine-data-copy-generate                       -   Generate
> explicit copying for affine memory operations
>         --fast-mem-capacity=<ulong>                     - Set fast memory
> space capacity in KiB (default: unlimited)
> 
> Note how the  "fast-mem-capacity" is nested under the
> "affine-data-copy-generate" (which is the pass name).
> On the command line it won't be present at the top-level and you end up
> invoking it this way:
> 
> // RUN: mlir-opt %s -split-input-file
> -affine-data-copy-generate="generate-dma=false fast-mem-space=0
> skip-non-unit-stride-loops" | FileCheck %s
> 
> It also has the advantage that you can invoke the same pass twice in the
> pipeline with a different value for the same cl::opt since the storage is
> private to a pass instance.
> 
> 
> 
> 
> >
> > This can be worked around with linker garbage collection by discarding
> > unreferenced cl::opt.
> >
> 
> I don't understand how this works actually: cl::opt that rely on global
> constructors aren't referenced ever, as long as the file is linked in they
> will be involved. This is an incredibly clunky situation to play with
> linker semantics here, we end relying on the way files are organized in
> static archives (and it breaks when you build libLLVM.so as you mentioned
> before).

Yep, the reliance on linker garbage collection makes hiding unrelated options.

On 2021-07-02, Mehdi AMINI wrote:
>Looks like our email crossed :)
>
>On Fri, Jul 2, 2021 at 4:07 PM Fāng-ruì Sòng <maskray at google.com> wrote:
>
>> On Fri, Jul 2, 2021 at 3:26 PM Fāng-ruì Sòng <maskray at google.com> wrote:
>>
>>> On Fri, Jul 2, 2021 at 3:21 PM Mehdi AMINI <joker.eph at gmail.com> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Jul 2, 2021 at 2:09 PM Fāng-ruì Sòng <maskray at google.com> wrote:
>>>>
>>>>> On Fri, Jul 2, 2021 at 2:03 PM Mehdi AMINI <joker.eph at gmail.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 2, 2021 at 1:27 PM Fāng-ruì Sòng <maskray at google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> On Fri, Jul 2, 2021 at 1:10 PM Mehdi AMINI <joker.eph at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jul 2, 2021 at 12:49 PM Alexandre Ganea <
>>>>>>>> alexandre.ganea at ubisoft.com> wrote:
>>>>>>>>
>>>>>>>>> The API benefits sound nice, though presumably some could be
>>>>>>>>> retrofitted to cl::opt if that was the only goal. Side benefits in addition
>>>>>>>>> to removing global ctors are nice to have.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The drawback is some initial boilerplate (e.g. llvm-tblgen
>>>>>>>>> -gen-opt-parser-defs in CMakeLists.txt, class NmOptTable in code).
>>>>>>>>>
>>>>>>>>> The handling of comma separated options -arch=x86_64,arm64 doesn't
>>>>>>>>> have direct OptTable support. llvm::SplitString is needed (just search for
>>>>>>>>> SplitString in https://reviews.llvm.org/D105330)
>>>>>>>>>
>>>>>>>>> But this doesn't tend to increase complexity because the
>>>>>>>>> cl::list<std::string> will need per-value verification anyway.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> One potential one (though I don't recall it being discussed
>>>>>>>>> recently) would be that maybe this addresses the issue of global ctors in
>>>>>>>>> cl::opt? Does OptTable avoid/not use global constructors? That would be
>>>>>>>>> nice - it's an ongoing issue that LLVM library users pay for command line
>>>>>>>>> argument support they have no need for in the form of global ctor execution
>>>>>>>>> time.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> OptTable is used as a local variable. So yes, it avoids global
>>>>>>>>> constructors,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Nice :)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Note that MLIR is using cl::opt without global ctor (we build with
>>>>>>>>> `-Werror=global-constructors`).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The pattern we use to write a tool with cl::opt and avoid global
>>>>>>>>> ctor (and can be used to avoid collision) looks like:
>>>>>>>>> https://github.com/llvm/llvm-project/blob/main/mlir/lib/IR/MLIRContext.cpp#L57-L83
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The tool that wants to expose the MLIRContext options to the
>>>>>>>>> command line calls registerMLIRContextCLOptions() before parsing the
>>>>>>>>> command line.
>>>>>>>>>
>>>>>>>>> Wouldn't this translate directly to LLVM tools as well with some
>>>>>>>>> minor refactoring?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The same applies to all of the infrastructure in MLIR, passes are
>>>>>>>>> registered explicitly, etc. This decouples the "is this code linked in"
>>>>>>>>> from "options are loaded" annoying part of the global constructors.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Mehdi
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *[Alexandre Ganea] *I think one other issue with cl::opt is that
>>>>>>>>> it aggregates the “command-line argument definition” and the “runtime
>>>>>>>>> parameter” *de facto* in a single object (unless cl::location is
>>>>>>>>> manually specified to every cl::opt). What MLIR does solves the issue
>>>>>>>>> mentioned by David, the fact that every tool pulls/initializes every
>>>>>>>>> cl::opt out there. However OptTable solves both problems, and makes the
>>>>>>>>> entry point thread-safe.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> I agree that removing the global state would be great!
>>>>>>>> Right now what I see proposed with OptTable (like
>>>>>>>> https://reviews.llvm.org/D104889) seems to just address the
>>>>>>>> tools-specific options, and the value isn't clear to me for these cases,
>>>>>>>> since these options aren't exposed through library entry points.
>>>>>>>> I don't quite get right now how OptTable would compose at the LLVM
>>>>>>>> scale? Are there examples of libraries exposing pluggable hooks for a tool
>>>>>>>> to aggregate multiple libraries' options and expose them on the command
>>>>>>>> line?
>>>>>>>>
>>>>>>>>
>>>>>>> The first message listed:
>>>>>>>
>>>>>>> * -t=d is removed (equal sign after a short option). Use -t d instead.
>>>>>>> * --demangle=0 (=0 to disable a boolean option) is removed. Omit the
>>>>>>> option or use --no-demangle instead.
>>>>>>> * To support boolean options (e.g. --demangle --no-demangle), we
>>>>>>> don't need to compare their positions (if (NoDemangle.getPosition() >
>>>>>>> Demangle.getPosition()) , see llvm-nm.cpp)
>>>>>>> * grouped short options can be specified with one line
>>>>>>> `setGroupedShortOptions`, instead of adding cl::Grouping to every short
>>>>>>> options.
>>>>>>> * We don't need to add cl::cat to every option and call
>>>>>>> `HideUnrelatedOptions` to hide unrelated options from --help. The issue
>>>>>>> would happen with cl::opt tools if linker garbage collection is disabled or
>>>>>>> libLLVM-13git.so is used. (See https://reviews.llvm.org/D104363)
>>>>>>>
>>>>>>> To me *these points are all usability issues of cl::opt*. I care
>>>>>>> about not exposing unnecessary interfaces so cl::opt accepting the weird
>>>>>>> -t=d looks a downside to me.
>>>>>>>
>>>>>>> --demangle=0 is weird and some llvm/llvm/test tests do use cl::opt
>>>>>>> options this way, so we cannot just remove this usage. As a workaround, we
>>>>>>> could add a cl::foobar_toggle to a cl::opt to disallow =0.
>>>>>>> We would end with more customization for one option, cl::cat (for
>>>>>>> hiding unrelated options), cl::foobar_toggle (for disallowing =0), and
>>>>>>> potentially others for other ad-hoc tasks.
>>>>>>>
>>>>>>> I can highlight another thing about the global state of cl::opt => *library
>>>>>>> cl::opt and binary utility cl::opt share the same namespace*.
>>>>>>> So cl::opt options (usually for debugging or testing) in library code
>>>>>>> can end up in a tool's list of command line options.
>>>>>>> This is usually undesired (e.g. llvm-objdump --x86-asm-syntax in
>>>>>>> https://reviews.llvm.org/D100433).
>>>>>>> People may not notice this if they always
>>>>>>> use -DLLVM_LINK_LLVM_DYLIB=off and don't use linker garbage collection.
>>>>>>>
>>>>>>
>>>>>> You're not answering my question here, are you? Are you answering to
>>>>>> what I mentioned 3 emails before in an answer to David when I wrote
>>>>>> "Indeed: it isn't clear to me that these are outright "benefits"."?
>>>>>>
>>>>>
>>>>> OptTable doesn't have the listed usability issues. Not having the
>>>>> issues is a large benefit to me.
>>>>>
>>>>
>>>> Maybe, these are subjective though.
>>>> (And the confusion above came from the fact that  you just answered the
>>>> wrong email)
>>>>
>>>>
>>>>>
>>>>>
>>>>>> Because I still don't see clearly how to build something like `opt`
>>>>>> with all the pass and the options with OptTable, how does it all compose?
>>>>>>
>>>>>
>>>>> The proposed changes are specific to binary utilities:
>>>>> llvm-{ar,cxxfilt,nm,objcopy,objdump,readobj,size,strings,symbolizer}.
>>>>> 'opt', 'llc', etc are not in the scope.
>>>>> (I guess I should have named the utilities more specifically to not
>>>>> cause confusion.)
>>>>>
>>>>> For
>>>>> llvm-{ar,cxxfilt,nm,objcopy,objdump,readobj,size,strings,symbolizer}, we
>>>>> don't want a random cl::opt in lib/Transforms, lib/MC, orlib/LTO to be
>>>>> accidentally specifiable on the command line.
>>>>> (In the rare debugging case where such functionality is needed, it
>>>>> needs a -mllvm prefix like what ld.lld does.)
>>>>>
>>>>
>>>> Have you looked at what I mentioned with MLIR on how to use cl::opt
>>>> without global constructor? This has exactly the behavior you're looking
>>>> for.
>>>>
>>>>
>>> I have read
>>> https://github.com/llvm/llvm-project/blob/main/mlir/lib/IR/MLIRContext.cpp#L57-L83
>>>
>>> Unless it does something more specially, I don't think it can avoid the
>>> issue I mentioned in a previous message:
>>>
>>> > I can highlight another thing about the global state of cl::opt => *library
>>> cl::opt and binary utility cl::opt share the same namespace*.
>>>
>>> This can be worked around with linker garbage collection by discarding
>>> unreferenced cl::opt.
>>>
>>
>> I re-read these messages. I think you probably meant something more
>> generic - how to design decentralized command line option registry where
>> every library can register some options.
>>
>
>Yes indeed!
>
>I'm not against using OptTable for the sole purpose of binary tools, but
>that does not seem to provide us with a path forward. So I am afraid that
>this is a local optima that is short sighted.

Mentioned previously, I don't try to alter current cl::opt usage in
library code and non-binary-utility tools (e.g. opt,llc).

Though I am not sure I agree with clang/lld/llvm-symbolizer/llvm-objdump
are using a local optima. I think cl::opt just don't fit for their use cases.
OptTable is just the appropriate solution for them.

Bogus: since options are data with the OptTable approach, we can re-use
something like clang-tblgen -gen-opt-docs to help ensure the documentation
doesn't diverge from the reality.

>> The binary utilities command line parsing issue I intended to address is
>> fairly different:
>> We want a single registry of all options and don't want to inherit options
>> from llvm/lib/A just because the tool happens to depend on llvm/lib/A
>> directly or indirectly.
>>
>
>I agree, what I pointed at in MLIR was an attempt to achieve this, the tool
>has to explicitly call a function at runtime to make the options available.
>
>
>> E.g. llvm-nm needs to depend on bitcode reader because it needs to handle
>> LLVM bitcode files, however, I don't want a random cl::opt in bitcode
>> reader to appear in llvm-nm's command line option list.
>>
>> So I just built mlir-opt and inspected its --help output. It has exactly
>> the problem I called out in my first message:
>>
>
>> * We don't need to add cl::cat to every option and call
>>
>> `HideUnrelatedOptions` to hide unrelated options from --help. The issue
>> would happen with cl::opt tools if linker garbage collection is disabled or
>> libLLVM-13git.so is used. (See https://reviews.llvm.org/D104363)
>>
>> There are many unrelated options like --amdgpu-bypass-slow-div which is from llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
>>
>>
>I think you're missing the point here: mlir-opt does not have this problem *for
>mlir components.* It pays the price of the LLVM components using global
>constructors, but I pointed to the solution we use for mlir components that
>aren't using global constructors and avoid this problem.

See [1] above