[LLVMdev] [RFC] New command line parsing/generating framework for clang and lld.
Michael Spencer
bigcheesegs at gmail.com
Thu Aug 9 13:26:02 PDT 2012
On Wed, Aug 1, 2012 at 2:23 PM, Michael Spencer <bigcheesegs at gmail.com> wrote:
> LLVM Command Line Library
>
> I'm proposing a heavy weight command line parsing and generating library for
> LLVM to replace Clang's parser and provide one for lld and any future tools
> that may need it.
>
> The scope of this library is slightly larger than what Clang has now, but not
> much.
>
> It is centered around the concept of a Tool. A Tool has a set of Options which
> can be parsed to Arguments or rendered from Arguments. It also has a set of
> Transformations that convert Arguments from another Tool to Arguments for
> itself.
>
> An Argument is an Option with bound values.
>
> Scope:
>
> * Parse argv/argc into an ArgumentList according to a TableGen file describing
> the Options for a given Tool.
>
> * Provide typo correction for Options.
>
> * Provide a way to print help text.
>
> * Render an ArgumentList to a string suitable for invoking another Tool.
>
> * Transform an ArgumentList from one Tool to another.
>
> The major addition this has over Clang is Transformations. A Transformation is a
> mapping from one pattern in an ArgumentList to another. These replace the hand
> written code in a driver that reads arguments and generates a command line to
> call another tool.
>
> An example for this from Clang would be going from Clang to Clang -cc1 options.
> Quite a few of these are trivial forwards, while others are more complicated
> and may depend on the values of other arguments.
>
> Transformations not only make this simpler, they also allow other drivers to
> more easily target Clang -cc1. A cl.exe style driver would get its own Tool,
> Options, and Transformation set.
>
> This also makes calling out to a single type of tool, such as the linker, with
> various tools that implement it (gnu-ld, ld64, link.exe) easier. You simply
> select which Tool to use for transformation, and render the resulting
> ArgumentList to a string to pass to the program.
>
> The TableGen Option definitions provide enough information to both parse and
> render command lines. This allows us to have a single definition of as and ld
> options and be able to reuse them in both Clang to call the tool, and in the
> llvm implementation of the tool itself to parse the command line.
>
> Here's a mockup of a TableGen file for part of Clang:
>
> Option.td:
> class Tool {
> // The list of all possible prefixes. Not every option in the tool has all
> // prefixes. Any string that does not begin with one of these
> prefixes and is
> // not an argument to a previous option is considered an input Argument. A
> // string that does begin with a prefix but is not a known option
> is eligible
> // for typo-correction.
> }
>
> def joined;
> def separate;
> def or;
> def str;
>
> class Option<list<string> prefixes, string name, Tool tool, dag
> strparse, string render, dag rendermatch> {
> // The tool this Option belongs to.
> Tool Tool_ = tool;
>
> // How to parse the Option from argc+argv.
> dag StringParse = strparse;
>
> // How to render the Option to a string. RenderMatch is used to capture
> // values and assign them identifiers. When Render is printed, these values
> // are inserted into it in the marked locations.
> string Render = render;
> dag RenderMatch = rendermatch;
>
> // The meta-variable name of each value.
> list<string> ValueMetavars;
>
> // The list of valid prefixes for this Option. The parser will check if
> // Prefixes[i] + Name is a prefix of a potential Option for each prefix in
> // Prefixes.
> list<string> Prefixes = prefixes;
>
> // The name of this Option without any prefixes or postfixes. This is what
> // typo correction is checked against.
> string Name = name;
>
> // Is Name case insensitive.
> bit IsCaseInsensitive = 0;
>
> // Should this Option be hidden from the default help.
> bit IsHidden = 0;
>
> // Used as a tiebraker when multiple Options share the same prefix. Higher
> // values are picked first.
> int Priority = 0;
>
> // The single Option that this Option is an alias of.
> Option Alias = ?;
>
> // The help text for this Option.
> string HelpText = ?;
> }
>
> class Alias<Option alias> {
> Option Alias = alias;
> }
>
> class MetaVars<list<string> mv> {
> list<string> ValueMetavars = mv;
> }
>
> class CaseInsensitive {
> bit IsCaseInsensitive = 1;
> }
>
> Clang.td:
> include "Option.td"
>
> def clang : Tool;
>
> class ClangOption< list<string> prefixes
> , string name
> , dag strparse
> , string render
> , dag rendermatch>
> : Option<prefixes, name, clang, strparse, render, rendermatch>;
>
> class ClangFlag<string name>
> : ClangOption<["-"], name, ?, "-"#name#, ?>;
>
> class ClangSingleLetterOption<string name>
> : ClangOption< ["-"], name, (or (joined (str ""), (str:$v0)),
> (separate (str:$v0)))
> , "-"#name#"$v0", (str:$v0)> {
> int Priority = 1;
> }
>
> def clang_f_strict_enums : ClangFlag<"fstrict-enums">;
> def clang_f_no_strict_enums : ClangFlag<"fno-strict-enums">;
> def clang_f_fast_math : ClangFlag<"ffast-math">;
> def clang_o : ClangSingleLetterOption<"o">, MetaVars<["<file>"]>;
>
> // And now for a simi-strange one. -ftemplate-depth.
> def clang_f_template_depth
> : ClangOption< ["-"], "ftemplate-depth"
> , (or (joined (str "="), (str:$v0)),
> (joined (str "-"), (str:$v0)))
> , "-ftemplate-depth=$v0", (str:$v0)>;
> // Note that we don't need to also have a clang_f_template_depth_EQ.
>
> // One with a limited set of values.
> class ClangSeparateValues<string name, list<string> values>
> : ClangOption< ["-"], name
> , (joined (str "="), (str:$v0 values))
> , "-"#name#"=$v0", (str:$v0)>;
>
> // This won't match unless the value is one of the ones in the list. We can
> // generate a very good error message with the information we have that
> // includes the list of valid values.
> def clang_f_fp_contract : ClangSeparateValues<"ffp-contract",
> ["fast", "on", "off"]>;
>
> ClangCC1.td:
> include "Option.td"
>
> def clang_cc1 : Tool;
>
> class ClangCC1Option< list<string> prefixes
> , string name
> , dag strparse
> , string render
> , dag rendermatch>
> : Option<prefixes, name, clang_cc1, strparse, render, rendermatch>;
>
> class ClangCC1Flag<string name>
> : ClangCC1Option<["-"], name, ?, "-"#name#, ?>;
> class ClangCC1Separate<string name>
> : ClangCC1Option<["-"], name, (separate (str:$v0)), "-"#name#"
> $v0", (str:$v0)>;
> class ClangCC1SeparateValues<string name, list<string> values>
> : ClangCC1Option< ["-"], name
> , (joined (str "="), (str:$v0 values))
> , "-"#name#"=$v0", (str:$v0)>;
>
> def clang_cc1_f_strict_enums : ClangCC1Flag<"fstrict-enums">;
> def clang_cc1_f_template_depth : ClangCC1Separate<"ftemplate-depth">;
> def clang_cc1_f_fp_contract : ClangCC1SeparateValues<"ffp-contract",
> ["fast", "on", "off"]>;
>
> You may wonder why the parsing info is a dag instead of just being essentially
> an enum value as it is in Clang's current implementation. The main reason for
> this is that there exist tools with option formats that do not nicely fit into
> that model. And in fact have many different ways of representing arguments.
>
> These are actually very simple to convert to C++ code from TableGen. It is also
> trivial to merge identical parsers before generating them, which means there's
> no code size explosion. Here's an example of what this would generate.
>
> ArgParseResult parseJoinedOrSeperate(const ArgParseState APS) {
> return parseOr(parseJoined("", parseStr(0)),
> parseSeperate(parseStr(0)))(APS);
> }
>
> Each parse* function is a template function which creates a function object that
> implements that parser with the given arguments. The integer argument for
> parseStr tell it which Argument value slot to put it in. This is based on v0
> from above.
>
> This is an idea of what transforms would look like:
>
> def not;
>
> class Transform<list<dag> match, list<dag> produce> {
> list<dag> M = match;
> list<dag> P = produce;
> }
>
> include "Clang.td"
> include "ClangCC1.td"
>
> def : Transform< [(clang_f_strict_enums), (not clang_f_no_strict_enums)])
> , [(clang_cc1_f_strict_enums)]>;
>
> def : Transform< [(clang_f_template_depth (str:$v0))]
> , [(clang_cc1_f_template_depth (str:$v0))]>;
> // Since this case is common, there would probably be a:
> def : Forward<clang_f_template_depth, clang_cc1_f_template_depth>;
> // This would simply copy the Argument values.
>
> def : Forward<clang_f_fp_contract, clang_cc1_f_fp_contract>;
>
> def : Transform< [(clang_f_fast_math), (not clang_f_fp_contract)]
> , [(clang_cc1_f_fp_contract (str "fast"))]>;
>
> For each Transform, each dag in M is matched against the ArgumentList in order.
> Once a dag matches an Argument the process continues with the next Argument in
> the list. Values are extracted using :$<name>. If all dags in M are satisfied,
> the dag in P has its :$<name> values substituted, converted to an Argument, then
> added to the output ArgumentList.
>
> Not all transforms can be represented in this manner, but you can still hand
> write the code for these casses.
>
> Attached is a patch that adds tools/llvm-cltest. This currently contains code
> that should be in a library and will not exist in the final version. This is a
> proof of concept for what TableGen would actually generate. It does not contain
> the actual TableGen implementation.
>
> - Michael Spencer
ping.
- Michael Spencer
More information about the llvm-dev
mailing list