[LLVMdev] [RFC] New command line parsing/generating framework for clang and lld.

Michael Spencer bigcheesegs at gmail.com
Thu Aug 9 13:26:02 PDT 2012


On Wed, Aug 1, 2012 at 2:23 PM, Michael Spencer <bigcheesegs at gmail.com> wrote:
> LLVM Command Line Library
>
> I'm proposing a heavy weight command line parsing and generating library for
> LLVM to replace Clang's parser and provide one for lld and any future tools
> that may need it.
>
> The scope of this library is slightly larger than what Clang has now, but not
> much.
>
> It is centered around the concept of a Tool. A Tool has a set of Options which
> can be parsed to Arguments or rendered from Arguments. It also has a set of
> Transformations that convert Arguments from another Tool to Arguments for
> itself.
>
> An Argument is an Option with bound values.
>
> Scope:
>
> * Parse argv/argc into an ArgumentList according to a TableGen file describing
>   the Options for a given Tool.
>
> * Provide typo correction for Options.
>
> * Provide a way to print help text.
>
> * Render an ArgumentList to a string suitable for invoking another Tool.
>
> * Transform an ArgumentList from one Tool to another.
>
> The major addition this has over Clang is Transformations. A Transformation is a
> mapping from one pattern in an ArgumentList to another. These replace the hand
> written code in a driver that reads arguments and generates a command line to
> call another tool.
>
> An example for this from Clang would be going from Clang to Clang -cc1 options.
> Quite a few of these are trivial forwards, while others are more complicated
> and may depend on the values of other arguments.
>
> Transformations not only make this simpler, they also allow other drivers to
> more easily target Clang -cc1. A cl.exe style driver would get its own Tool,
> Options, and Transformation set.
>
> This also makes calling out to a single type of tool, such as the linker, with
> various tools that implement it (gnu-ld, ld64, link.exe) easier. You simply
> select which Tool to use for transformation, and render the resulting
> ArgumentList to a string to pass to the program.
>
> The TableGen Option definitions provide enough information to both parse and
> render command lines. This allows us to have a single definition of as and ld
> options and be able to reuse them in both Clang to call the tool, and in the
> llvm implementation of the tool itself to parse the command line.
>
> Here's a mockup of a TableGen file for part of Clang:
>
> Option.td:
>   class Tool {
>     // The list of all possible prefixes. Not every option in the tool has all
>     // prefixes. Any string that does not begin with one of these
> prefixes and is
>     // not an argument to a previous option is considered an input Argument. A
>     // string that does begin with a prefix but is not a known option
> is eligible
>     // for typo-correction.
>   }
>
>   def joined;
>   def separate;
>   def or;
>   def str;
>
>   class Option<list<string> prefixes, string name, Tool tool, dag
> strparse, string render, dag rendermatch> {
>     // The tool this Option belongs to.
>     Tool Tool_ = tool;
>
>     // How to parse the Option from argc+argv.
>     dag StringParse = strparse;
>
>     // How to render the Option to a string. RenderMatch is used to capture
>     // values and assign them identifiers. When Render is printed, these values
>     // are inserted into it in the marked locations.
>     string Render = render;
>     dag RenderMatch = rendermatch;
>
>     // The meta-variable name of each value.
>     list<string> ValueMetavars;
>
>     // The list of valid prefixes for this Option. The parser will check if
>     // Prefixes[i] + Name is a prefix of a potential Option for each prefix in
>     // Prefixes.
>     list<string> Prefixes = prefixes;
>
>     // The name of this Option without any prefixes or postfixes. This is what
>     // typo correction is checked against.
>     string Name = name;
>
>     // Is Name case insensitive.
>     bit IsCaseInsensitive = 0;
>
>     // Should this Option be hidden from the default help.
>     bit IsHidden = 0;
>
>     // Used as a tiebraker when multiple Options share the same prefix. Higher
>     // values are picked first.
>     int Priority = 0;
>
>     // The single Option that this Option is an alias of.
>     Option Alias = ?;
>
>      // The help text for this Option.
>     string HelpText = ?;
>   }
>
>   class Alias<Option alias> {
>     Option Alias = alias;
>   }
>
>   class MetaVars<list<string> mv> {
>     list<string> ValueMetavars = mv;
>   }
>
>   class CaseInsensitive {
>     bit IsCaseInsensitive = 1;
>   }
>
> Clang.td:
>   include "Option.td"
>
>   def clang : Tool;
>
>   class ClangOption< list<string> prefixes
>                    , string name
>                    , dag strparse
>                    , string render
>                    , dag rendermatch>
>     : Option<prefixes, name, clang, strparse, render, rendermatch>;
>
>   class ClangFlag<string name>
>     : ClangOption<["-"], name, ?, "-"#name#, ?>;
>
>   class ClangSingleLetterOption<string name>
>     : ClangOption< ["-"], name, (or (joined (str ""), (str:$v0)),
>                                     (separate (str:$v0)))
>                  , "-"#name#"$v0", (str:$v0)> {
>     int Priority = 1;
>   }
>
>   def clang_f_strict_enums : ClangFlag<"fstrict-enums">;
>   def clang_f_no_strict_enums : ClangFlag<"fno-strict-enums">;
>   def clang_f_fast_math : ClangFlag<"ffast-math">;
>   def clang_o : ClangSingleLetterOption<"o">, MetaVars<["<file>"]>;
>
>   // And now for a simi-strange one. -ftemplate-depth.
>   def clang_f_template_depth
>     : ClangOption< ["-"], "ftemplate-depth"
>                  , (or (joined (str "="), (str:$v0)),
>                        (joined (str "-"), (str:$v0)))
>                  , "-ftemplate-depth=$v0", (str:$v0)>;
>   // Note that we don't need to also have a clang_f_template_depth_EQ.
>
>   // One with a limited set of values.
>   class ClangSeparateValues<string name, list<string> values>
>     : ClangOption< ["-"], name
>                  , (joined (str "="), (str:$v0 values))
>                  , "-"#name#"=$v0", (str:$v0)>;
>
>   // This won't match unless the value is one of the ones in the list. We can
>   // generate a very good error message with the information we have that
>   // includes the list of valid values.
>   def clang_f_fp_contract : ClangSeparateValues<"ffp-contract",
> ["fast", "on", "off"]>;
>
> ClangCC1.td:
>   include "Option.td"
>
>   def clang_cc1 : Tool;
>
>   class ClangCC1Option< list<string> prefixes
>                       , string name
>                       , dag strparse
>                       , string render
>                       , dag rendermatch>
>     : Option<prefixes, name, clang_cc1, strparse, render, rendermatch>;
>
>   class ClangCC1Flag<string name>
>     : ClangCC1Option<["-"], name, ?, "-"#name#, ?>;
>   class ClangCC1Separate<string name>
>     : ClangCC1Option<["-"], name, (separate (str:$v0)), "-"#name#"
> $v0", (str:$v0)>;
>   class ClangCC1SeparateValues<string name, list<string> values>
>     : ClangCC1Option< ["-"], name
>                     , (joined (str "="), (str:$v0 values))
>                     , "-"#name#"=$v0", (str:$v0)>;
>
>   def clang_cc1_f_strict_enums : ClangCC1Flag<"fstrict-enums">;
>   def clang_cc1_f_template_depth : ClangCC1Separate<"ftemplate-depth">;
>   def clang_cc1_f_fp_contract : ClangCC1SeparateValues<"ffp-contract",
> ["fast", "on", "off"]>;
>
> You may wonder why the parsing info is a dag instead of just being essentially
> an enum value as it is in Clang's current implementation. The main reason for
> this is that there exist tools with option formats that do not nicely fit into
> that model. And in fact have many different ways of representing arguments.
>
> These are actually very simple to convert to C++ code from TableGen. It is also
> trivial to merge identical parsers before generating them, which means there's
> no code size explosion. Here's an example of what this would generate.
>
>   ArgParseResult parseJoinedOrSeperate(const ArgParseState APS) {
>     return parseOr(parseJoined("", parseStr(0)),
>                    parseSeperate(parseStr(0)))(APS);
>   }
>
> Each parse* function is a template function which creates a function object that
> implements that parser with the given arguments. The integer argument for
> parseStr tell it which Argument value slot to put it in. This is based on v0
> from above.
>
> This is an idea of what transforms would look like:
>
>   def not;
>
>   class Transform<list<dag> match, list<dag> produce> {
>     list<dag> M = match;
>     list<dag> P = produce;
>   }
>
>   include "Clang.td"
>   include "ClangCC1.td"
>
>   def : Transform< [(clang_f_strict_enums), (not clang_f_no_strict_enums)])
>                  , [(clang_cc1_f_strict_enums)]>;
>
>   def : Transform< [(clang_f_template_depth (str:$v0))]
>                  , [(clang_cc1_f_template_depth (str:$v0))]>;
>   // Since this case is common, there would probably be a:
>   def : Forward<clang_f_template_depth, clang_cc1_f_template_depth>;
>   // This would simply copy the Argument values.
>
>   def : Forward<clang_f_fp_contract, clang_cc1_f_fp_contract>;
>
>   def : Transform< [(clang_f_fast_math), (not clang_f_fp_contract)]
>                  , [(clang_cc1_f_fp_contract (str "fast"))]>;
>
> For each Transform, each dag in M is matched against the ArgumentList in order.
> Once a dag matches an Argument the process continues with the next Argument in
> the list. Values are extracted using :$<name>. If all dags in M are satisfied,
> the dag in P has its :$<name> values substituted, converted to an Argument, then
> added to the output ArgumentList.
>
> Not all transforms can be represented in this manner, but you can still hand
> write the code for these casses.
>
> Attached is a patch that adds tools/llvm-cltest. This currently contains code
> that should be in a library and will not exist in the final version. This is a
> proof of concept for what TableGen would actually generate. It does not contain
> the actual TableGen implementation.
>
> - Michael Spencer

ping.

- Michael Spencer



More information about the llvm-dev mailing list