[clang] 8ddfcec - [clang][cli] Documentation of CompilerInvocation parsing/generation
Jan Svoboda via cfe-commits
cfe-commits at lists.llvm.org
Wed Feb 17 05:31:07 PST 2021
Author: Jan Svoboda
Date: 2021-02-17T14:29:49+01:00
New Revision: 8ddfcec91b7801af058f43906073822d7f146bf5
URL: https://github.com/llvm/llvm-project/commit/8ddfcec91b7801af058f43906073822d7f146bf5
DIFF: https://github.com/llvm/llvm-project/commit/8ddfcec91b7801af058f43906073822d7f146bf5.diff
LOG: [clang][cli] Documentation of CompilerInvocation parsing/generation
This patch documents command line parsing in `-cc1`, `CompilerInvocation` and the marshalling infrastructure for command line options.
Reviewed By: Bigcheese
Differential Revision: https://reviews.llvm.org/D95790
Added:
Modified:
clang/docs/InternalsManual.rst
Removed:
################################################################################
diff --git a/clang/docs/InternalsManual.rst b/clang/docs/InternalsManual.rst
index 5732e9b9af03..906b6d153f2b 100644
--- a/clang/docs/InternalsManual.rst
+++ b/clang/docs/InternalsManual.rst
@@ -572,6 +572,422 @@ The Frontend Library
The Frontend library contains functionality useful for building tools on top of
the Clang libraries, for example several methods for outputting diagnostics.
+Compiler Invocation
+-------------------
+
+One of the classes provided by the Frontend library is ``CompilerInvocation``,
+which holds information that describe current invocation of the Clang frontend.
+The information typically comes from the command line constructed by the Clang
+driver or from clients performing custom initialization. The data structure is
+split into logical units used by
diff erent parts of the compiler, for example
+``PreprocessorOptions``, ``LanguageOptions`` or ``CodeGenOptions``.
+
+Command Line Interface
+----------------------
+
+The command line interface of the Clang ``-cc1`` frontend is defined alongside
+the driver options in ``clang/Driver/Options.td``. The information making up an
+option definition includes its prefix and name (for example ``-std=``), form and
+position of the option value, help text, aliases and more. Each option may
+belong to a certain group and can be marked with zero or more flags. Options
+accepted by the ``-cc1`` frontend are marked with the ``CC1Option`` flag.
+
+Command Line Parsing
+--------------------
+
+Option definitions are processed by the ``-gen-opt-parser-defs`` tablegen
+backend during early stages of the build. Options are then used for querying an
+instance ``llvm::opt::ArgList``, a wrapper around the command line arguments.
+This is done in the Clang driver to construct individual jobs based on the
+driver arguments and also in the ``CompilerInvocation::CreateFromArgs`` function
+that parses the ``-cc1`` frontend arguments.
+
+Command Line Generation
+-----------------------
+
+Any valid ``CompilerInvocation`` created from a ``-cc1`` command line can be
+also serialized back into semantically equivalent command line in a
+deterministic manner. This enables features such as implicitly discovered,
+explicitly built modules.
+
+..
+ TODO: Create and link corresponding section in Modules.rst.
+
+Adding new Command Line Option
+------------------------------
+
+When adding a new command line option, the first place of interest is the header
+file declaring the corresponding options class (e.g. ``CodeGenOptions.h`` for
+command line option that affects the code generation). Create new member
+variable for the option value:
+
+.. code-block::
diff
+
+ class CodeGenOptions : public CodeGenOptionsBase {
+
+ + /// List of dynamic shared object files to be loaded as pass plugins.
+ + std::vector<std::string> PassPlugins;
+
+ }
+
+Next, declare the command line interface of the option in the tablegen file
+``clang/include/clang/Driver/Options.td``. This is done by instantiating the
+``Option`` class (defined in ``llvm/include/llvm/Option/OptParser.td``). The
+instance is typically created through one of the helper classes that encode the
+acceptable ways to specify the option value on the command line:
+
+* ``Flag`` - the option does not accept any value,
+* ``Joined`` - the value must immediately follow the option name within the same
+ argument,
+* ``Separate`` - the value must follow the option name in the next command line
+ argument,
+* ``JoinedOrSeparate`` - the value can be specified either as ``Joined`` or
+ ``Separate``,
+* ``CommaJoined`` - the values are comma-separated and must immediately follow
+ the option name within the same argument (see ``Wl,`` for an example).
+
+The helper classes take a list of acceptable prefixes of the option (e.g.
+``"-"``, ``"--"`` or ``"/"``) and the option name:
+
+.. code-block::
diff
+
+ // Options.td
+
+ + def fpass_plugin_EQ : Joined<["-"], "fpass-plugin=">;
+
+Then, specify additional attributes via mix-ins:
+
+* ``HelpText`` holds the text that will be printed besides the option name when
+ the user requests help (e.g. via ``clang --help``).
+* ``Group`` specifies the "category" of options this option belongs to. This is
+ used by various tools to filter certain options of interest.
+* ``Flags`` may contain a number of "tags" associated with the option. This
+ enables more granular filtering than the ``Group`` attribute.
+* ``Alias`` denotes that the option is an alias of another option. This may be
+ combined with ``AliasArgs`` that holds the implied value.
+
+.. code-block::
diff
+
+ // Options.td
+
+ def fpass_plugin_EQ : Joined<["-"], "fpass-plugin=">,
+ + Group<f_Group>, Flags<[CC1Option]>,
+ + HelpText<"Load pass plugin from a dynamic shared object file.">;
+
+New options are recognized by the Clang driver unless marked with the
+``NoDriverOption`` flag. On the other hand, options intended for the ``-cc1``
+frontend must be explicitly marked with the ``CC1Option`` flag.
+
+Next, parse (or manufacture) the command line arguments in the Clang driver and
+use them to construct the ``-cc1`` job:
+
+.. code-block::
diff
+
+ void Clang::ConstructJob(const ArgList &Args /*...*/) const {
+ ArgStringList CmdArgs;
+ // ...
+
+ + for (const Arg *A : Args.filtered(OPT_fpass_plugin_EQ)) {
+ + CmdArgs.push_back(Args.MakeArgString(Twine("-fpass-plugin=") + A->getValue()));
+ + A->claim();
+ + }
+ }
+
+The last step is implementing the ``-cc1`` command line argument
+parsing/generation that initializes/serializes the option class (in our case
+``CodeGenOptions``) stored within ``CompilerInvocation``. This can be done
+automatically by using the marshalling annotations on the option definition:
+
+.. code-block::
diff
+
+ // Options.td
+
+ def fpass_plugin_EQ : Joined<["-"], "fpass-plugin=">,
+ Group<f_Group>, Flags<[CC1Option]>,
+ HelpText<"Load pass plugin from a dynamic shared object file.">,
+ + MarshallingInfoStringVector<CodeGenOpts<"PassPlugins">>;
+
+Inner workings of the system are introduced in the :ref:`marshalling
+infrastructure <OptionMarshalling>` section and the available annotations are
+listed :ref:`here <OptionMarshallingAnnotations>`.
+
+In case the marshalling infrastructure does not support the desired semantics,
+consider simplifying it to fit the existing model. This makes the command line
+more uniform and reduces the amount of custom, manually written code. Remember
+that the ``-cc1`` command line interface is intended only for Clang developers,
+meaning it does not need to mirror the driver interface, maintain backward
+compatibility or be compatible with GCC.
+
+If the option semantics cannot be encoded via marshalling annotations, you can
+resort to parsing/serializing the command line arguments manually:
+
+.. code-block::
diff
+
+ // CompilerInvocation.cpp
+
+ static bool ParseCodeGenArgs(CodeGenOptions &Opts, ArgList &Args /*...*/) {
+ // ...
+
+ + Opts.PassPlugins = Args.getAllArgValues(OPT_fpass_plugin_EQ);
+ }
+
+ static void GenerateCodeGenArgs(const CodeGenOptions &Opts,
+ SmallVectorImpl<const char *> &Args,
+ CompilerInvocation::StringAllocator SA /*...*/) {
+ // ...
+
+ + for (const std::string &PassPlugin : Opts.PassPlugins)
+ + GenerateArg(Args, OPT_fpass_plugin_EQ, PassPlugin, SA);
+ }
+
+Finally, you can specify the argument on the command line:
+``clang -fpass-plugin=a -fpass-plugin=b`` and use the new member variable as
+desired.
+
+.. code-block::
diff
+
+ void EmitAssemblyHelper::EmitAssemblyWithNewPassManager(/*...*/) {
+ // ...
+ + for (auto &PluginFN : CodeGenOpts.PassPlugins)
+ + if (auto PassPlugin = PassPlugin::Load(PluginFN))
+ + PassPlugin->registerPassBuilderCallbacks(PB);
+ }
+
+.. _OptionMarshalling:
+
+Option Marshalling Infrastructure
+---------------------------------
+
+The option marshalling infrastructure automates the parsing of command line
+arguments into ``CompilerInvocation`` and their generation from
+``CompilerInvocation``. The system replaces lots of repetitive C++ code with
+simple, declarative tablegen annotations and it's being used for the majority of
+the ``-cc1`` command line interface. This section provides an overview of the
+system.
+
+To read and modify contents of ``CompilerInvocation``, the marshalling system
+uses key paths, which are declared in two steps. First, a tablegen definition
+for the ``CompilerInvocation`` member is created by inheriting from
+``KeyPathAndMacro``:
+
+.. code-block::
+
+ // Options.td
+
+ class LangOpts<string field> : KeyPathAndMacro<"LangOpts->", field, "LANG_"> {}
+ // CompilerInvocation member ^^^^^^^^^^
+ // OPTION_WITH_MARSHALLING prefix ^^^^^
+
+The first argument to the parent class is the beginning of the key path that
+references the ``CompilerInvocation`` member. This argument ends with ``->`` if
+the member is a pointer type or with ``.`` if it's a value type. The child class
+takes a single parameter ``field`` that is forwarded as the second argument to
+the base class. The child class can then be used like so:
+``LangOpts<"IgnoreExceptions">``, constructing a key path to the field
+``LangOpts->IgnoreExceptions``. The third argument passed to the parent class is
+a string that the tablegen backend uses as a prefix to the
+``OPTION_WITH_MARSHALLING`` macro. Using the key path as a mix-in on an
+``Option`` instance instructs the backend to generate the following code:
+
+.. code-block:: c++
+
+ // Options.inc
+
+ #ifdef LANG_OPTION_WITH_MARSHALLING
+ LANG_OPTION_WITH_MARSHALLING([...], LangOpts->IgnoreExceptions, [...])
+ #endif // LANG_OPTION_WITH_MARSHALLING
+
+Such definition can be used used in the function for parsing and generating
+command line:
+
+.. code-block:: c++
+
+ // clang/lib/Frontend/CompilerInvoation.cpp
+
+ bool CompilerInvocation::ParseLangArgs(LangOptions *LangOpts, ArgList &Args,
+ DiagnosticsEngine &Diags) {
+ bool Success = true;
+
+ #define LANG_OPTION_WITH_MARSHALLING( \
+ PREFIX_TYPE, NAME, ID, KIND, GROUP, ALIAS, ALIASARGS, FLAGS, PARAM, \
+ HELPTEXT, METAVAR, VALUES, SPELLING, SHOULD_PARSE, ALWAYS_EMIT, KEYPATH, \
+ DEFAULT_VALUE, IMPLIED_CHECK, IMPLIED_VALUE, NORMALIZER, DENORMALIZER, \
+ MERGER, EXTRACTOR, TABLE_INDEX) \
+ PARSE_OPTION_WITH_MARSHALLING(Args, Diags, Success, ID, FLAGS, PARAM, \
+ SHOULD_PARSE, KEYPATH, DEFAULT_VALUE, \
+ IMPLIED_CHECK, IMPLIED_VALUE, NORMALIZER, \
+ MERGER, TABLE_INDEX)
+ #include "clang/Driver/Options.inc"
+ #undef LANG_OPTION_WITH_MARSHALLING
+
+ // ...
+
+ return Success;
+ }
+
+ void CompilerInvocation::GenerateLangArgs(LangOptions *LangOpts,
+ SmallVectorImpl<const char *> &Args,
+ StringAllocator SA) {
+ #define LANG_OPTION_WITH_MARSHALLING( \
+ PREFIX_TYPE, NAME, ID, KIND, GROUP, ALIAS, ALIASARGS, FLAGS, PARAM, \
+ HELPTEXT, METAVAR, VALUES, SPELLING, SHOULD_PARSE, ALWAYS_EMIT, KEYPATH, \
+ DEFAULT_VALUE, IMPLIED_CHECK, IMPLIED_VALUE, NORMALIZER, DENORMALIZER, \
+ MERGER, EXTRACTOR, TABLE_INDEX) \
+ GENERATE_OPTION_WITH_MARSHALLING( \
+ Args, SA, KIND, FLAGS, SPELLING, ALWAYS_EMIT, KEYPATH, DEFAULT_VALUE, \
+ IMPLIED_CHECK, IMPLIED_VALUE, DENORMALIZER, EXTRACTOR, TABLE_INDEX)
+ #include "clang/Driver/Options.inc"
+ #undef LANG_OPTION_WITH_MARSHALLING
+
+ // ...
+ }
+
+The ``PARSE_OPTION_WITH_MARSHALLING`` and ``GENERATE_OPTION_WITH_MARSHALLING``
+macros are defined in ``CompilerInvocation.cpp`` and they implement the generic
+algorithm for parsing and generating command line arguments.
+
+.. _OptionMarshallingAnnotations:
+
+Option Marshalling Annotations
+------------------------------
+
+How does the tablegen backend know what to put in place of ``[...]`` in the
+generated ``Options.inc``? This is specified by the ``Marshalling`` utilities
+described below. All of them take a key path argument and possibly other
+information required for parsing or generating the command line argument.
+
+**Positive Flag**
+
+The key path defaults to ``false`` and is set to ``true`` when the flag is
+present on command line.
+
+.. code-block::
+
+ def fignore_exceptions : Flag<["-"], "fignore-exceptions">, Flags<[CC1Option]>,
+ MarshallingInfoFlag<LangOpts<"IgnoreExceptions">>;
+
+**Negative Flag**
+
+The key path defaults to ``true`` and is set to ``false`` when the flag is
+present on command line.
+
+.. code-block::
+
+ def fno_verbose_asm : Flag<["-"], "fno-verbose-asm">, Flags<[CC1Option]>,
+ MarshallingInfoNegativeFlag<CodeGenOpts<"AsmVerbose">>;
+
+**Negative and Positive Flag**
+
+The key path defaults to the specified value (``false``, ``true`` or some
+boolean value that's statically unknown in the tablegen file). Then, the key
+path is set to the value associated with the flag that appears last on command
+line.
+
+.. code-block::
+
+ defm legacy_pass_manager : BoolOption<"f", "legacy-pass-manager",
+ CodeGenOpts<"LegacyPassManager">, DefaultFalse,
+ PosFlag<SetTrue, [], "Use the legacy pass manager in LLVM">,
+ NegFlag<SetFalse, [], "Use the new pass manager in LLVM">,
+ BothFlags<[CC1Option]>>;
+
+With most such pair of flags, the ``-cc1`` frontend accepts only the flag that
+changes the default key path value. The Clang driver is responsible for
+accepting both and either forwarding the changing flag or discarding the flag
+that would just set the key path to its default.
+
+The first argument to ``BoolOption`` is a prefix that is used to construct the
+full names of both flags. The positive flag would then be named
+``flegacy-pass-manager`` and the negative ``fno-legacy-pass-manager``.
+``BoolOption`` also implies the ``-`` prefix for both flags. It's also possible
+to use ``BoolFOption`` that implies the ``"f"`` prefix and ``Group<f_Group>``.
+The ``PosFlag`` and ``NegFlag`` classes hold the associated boolean value, an
+array of elements passed to the ``Flag`` class and the help text. The optional
+``BothFlags`` class holds an array of ``Flag`` elements that are common for both
+the positive and negative flag and their common help text suffix.
+
+**String**
+
+The key path defaults to the specified string, or an empty one, if omitted. When
+the option appears on the command line, the argument value is simply copied.
+
+.. code-block::
+
+ def isysroot : JoinedOrSeparate<["-"], "isysroot">, Flags<[CC1Option]>,
+ MarshallingInfoString<HeaderSearchOpts<"Sysroot">, [{"/"}]>;
+
+**List of Strings**
+
+The key path defaults to an empty ``std::vector<std::string>``. Values specified
+with each appearance of the option on the command line are appended to the
+vector.
+
+.. code-block::
+
+ def frewrite_map_file : Separate<["-"], "frewrite-map-file">, Flags<[CC1Option]>,
+ MarshallingInfoStringVector<CodeGenOpts<"RewriteMapFiles">>;
+
+**Integer**
+
+The key path defaults to the specified integer value, or ``0`` if omitted. When
+the option appears on the command line, its value gets parsed by ``llvm::APInt``
+and the result is assigned to the key path on success.
+
+.. code-block::
+
+ def mstack_probe_size : Joined<["-"], "mstack-probe-size=">, Flags<[CC1Option]>,
+ MarshallingInfoStringInt<CodeGenOpts<"StackProbeSize">, "4096">;
+
+**Enumeration**
+
+The key path defaults to the value specified in ``MarshallingInfoString``
+prefixed by the contents of ``NormalizedValuesScope`` and ``::``. This ensures
+correct reference to an enum case is formed even if the enum resides in
diff erent
+namespace or is an enum class. If the value present on command line does not
+match any of the comma-separated values from ``Values``, an error diagnostics is
+issued. Otherwise, the corresponding element from ``NormalizedValues`` at the
+same index is assigned to the key path (also correctly scoped). The number of
+comma-separated string values and elements of the array within
+``NormalizedValues`` must match. The ``AutoNormalizeEnum`` mix-in denotes the
+key path option should be treated as an enum and not as a string.
+
+.. code-block::
+
+ def mthread_model : Separate<["-"], "mthread-model">, Flags<[CC1Option]>,
+ Values<"posix,single">, NormalizedValues<["POSIX", "Single"]>,
+ NormalizedValuesScope<"LangOptions::ThreadModelKind">,
+ MarshallingInfoString<LangOpts<"ThreadModel">, "POSIX">, AutoNormalizeEnum;
+
+..
+ Intentionally omitting MarshallingInfoBitfieldFlag. It's adding some
+ complexity to the marshalling infrastructure and might be removed.
+
+It is also possible to define relationships between options.
+
+**Implication**
+
+The key path defaults to the default value from the primary ``Marshalling``
+annotation. Then, if any of the elements of ``ImpliedByAnyOf`` evaluate to true,
+the key path value is changed to the specified value or ``true`` if missing.
+Finally, the command line is parsed according to the primary annotation.
+
+.. code-block::
+
+ def fms_extensions : Flag<["-"], "fms-extensions">, Flags<[CC1Option]>,
+ MarshallingInfoFlag<LangOpts<"MicrosoftExt">>,
+ ImpliedByAnyOf<[fms_compatibility.KeyPath], "true">;
+
+**Condition**
+
+The option is parsed only if the expression in ``ShouldParseIf`` evaluates to
+true.
+
+.. code-block::
+
+ def fopenmp_enable_irbuilder : Flag<["-"], "fopenmp-enable-irbuilder">, Flags<[CC1Option]>,
+ MarshallingInfoFlag<LangOpts<"OpenMPIRBuilder">>,
+ ShouldParseIf<fopenmp.KeyPath>;
+
The Lexer and Preprocessor Library
==================================
More information about the cfe-commits
mailing list