[LLVMdev] [RFC] New command line parsing/generating framework for clang and lld.

Michael Spencer bigcheesegs at gmail.com
Wed Aug 1 14:23:56 PDT 2012


LLVM Command Line Library

I'm proposing a heavy weight command line parsing and generating library for
LLVM to replace Clang's parser and provide one for lld and any future tools
that may need it.

The scope of this library is slightly larger than what Clang has now, but not
much.

It is centered around the concept of a Tool. A Tool has a set of Options which
can be parsed to Arguments or rendered from Arguments. It also has a set of
Transformations that convert Arguments from another Tool to Arguments for
itself.

An Argument is an Option with bound values.

Scope:

* Parse argv/argc into an ArgumentList according to a TableGen file describing
  the Options for a given Tool.

* Provide typo correction for Options.

* Provide a way to print help text.

* Render an ArgumentList to a string suitable for invoking another Tool.

* Transform an ArgumentList from one Tool to another.

The major addition this has over Clang is Transformations. A Transformation is a
mapping from one pattern in an ArgumentList to another. These replace the hand
written code in a driver that reads arguments and generates a command line to
call another tool.

An example for this from Clang would be going from Clang to Clang -cc1 options.
Quite a few of these are trivial forwards, while others are more complicated
and may depend on the values of other arguments.

Transformations not only make this simpler, they also allow other drivers to
more easily target Clang -cc1. A cl.exe style driver would get its own Tool,
Options, and Transformation set.

This also makes calling out to a single type of tool, such as the linker, with
various tools that implement it (gnu-ld, ld64, link.exe) easier. You simply
select which Tool to use for transformation, and render the resulting
ArgumentList to a string to pass to the program.

The TableGen Option definitions provide enough information to both parse and
render command lines. This allows us to have a single definition of as and ld
options and be able to reuse them in both Clang to call the tool, and in the
llvm implementation of the tool itself to parse the command line.

Here's a mockup of a TableGen file for part of Clang:

Option.td:
  class Tool {
    // The list of all possible prefixes. Not every option in the tool has all
    // prefixes. Any string that does not begin with one of these
prefixes and is
    // not an argument to a previous option is considered an input Argument. A
    // string that does begin with a prefix but is not a known option
is eligible
    // for typo-correction.
  }

  def joined;
  def separate;
  def or;
  def str;

  class Option<list<string> prefixes, string name, Tool tool, dag
strparse, string render, dag rendermatch> {
    // The tool this Option belongs to.
    Tool Tool_ = tool;

    // How to parse the Option from argc+argv.
    dag StringParse = strparse;

    // How to render the Option to a string. RenderMatch is used to capture
    // values and assign them identifiers. When Render is printed, these values
    // are inserted into it in the marked locations.
    string Render = render;
    dag RenderMatch = rendermatch;

    // The meta-variable name of each value.
    list<string> ValueMetavars;

    // The list of valid prefixes for this Option. The parser will check if
    // Prefixes[i] + Name is a prefix of a potential Option for each prefix in
    // Prefixes.
    list<string> Prefixes = prefixes;

    // The name of this Option without any prefixes or postfixes. This is what
    // typo correction is checked against.
    string Name = name;

    // Is Name case insensitive.
    bit IsCaseInsensitive = 0;

    // Should this Option be hidden from the default help.
    bit IsHidden = 0;

    // Used as a tiebraker when multiple Options share the same prefix. Higher
    // values are picked first.
    int Priority = 0;

    // The single Option that this Option is an alias of.
    Option Alias = ?;

     // The help text for this Option.
    string HelpText = ?;
  }

  class Alias<Option alias> {
    Option Alias = alias;
  }

  class MetaVars<list<string> mv> {
    list<string> ValueMetavars = mv;
  }

  class CaseInsensitive {
    bit IsCaseInsensitive = 1;
  }

Clang.td:
  include "Option.td"

  def clang : Tool;

  class ClangOption< list<string> prefixes
                   , string name
                   , dag strparse
                   , string render
                   , dag rendermatch>
    : Option<prefixes, name, clang, strparse, render, rendermatch>;

  class ClangFlag<string name>
    : ClangOption<["-"], name, ?, "-"#name#, ?>;

  class ClangSingleLetterOption<string name>
    : ClangOption< ["-"], name, (or (joined (str ""), (str:$v0)),
                                    (separate (str:$v0)))
                 , "-"#name#"$v0", (str:$v0)> {
    int Priority = 1;
  }

  def clang_f_strict_enums : ClangFlag<"fstrict-enums">;
  def clang_f_no_strict_enums : ClangFlag<"fno-strict-enums">;
  def clang_f_fast_math : ClangFlag<"ffast-math">;
  def clang_o : ClangSingleLetterOption<"o">, MetaVars<["<file>"]>;

  // And now for a simi-strange one. -ftemplate-depth.
  def clang_f_template_depth
    : ClangOption< ["-"], "ftemplate-depth"
                 , (or (joined (str "="), (str:$v0)),
                       (joined (str "-"), (str:$v0)))
                 , "-ftemplate-depth=$v0", (str:$v0)>;
  // Note that we don't need to also have a clang_f_template_depth_EQ.

  // One with a limited set of values.
  class ClangSeparateValues<string name, list<string> values>
    : ClangOption< ["-"], name
                 , (joined (str "="), (str:$v0 values))
                 , "-"#name#"=$v0", (str:$v0)>;

  // This won't match unless the value is one of the ones in the list. We can
  // generate a very good error message with the information we have that
  // includes the list of valid values.
  def clang_f_fp_contract : ClangSeparateValues<"ffp-contract",
["fast", "on", "off"]>;

ClangCC1.td:
  include "Option.td"

  def clang_cc1 : Tool;

  class ClangCC1Option< list<string> prefixes
                      , string name
                      , dag strparse
                      , string render
                      , dag rendermatch>
    : Option<prefixes, name, clang_cc1, strparse, render, rendermatch>;

  class ClangCC1Flag<string name>
    : ClangCC1Option<["-"], name, ?, "-"#name#, ?>;
  class ClangCC1Separate<string name>
    : ClangCC1Option<["-"], name, (separate (str:$v0)), "-"#name#"
$v0", (str:$v0)>;
  class ClangCC1SeparateValues<string name, list<string> values>
    : ClangCC1Option< ["-"], name
                    , (joined (str "="), (str:$v0 values))
                    , "-"#name#"=$v0", (str:$v0)>;

  def clang_cc1_f_strict_enums : ClangCC1Flag<"fstrict-enums">;
  def clang_cc1_f_template_depth : ClangCC1Separate<"ftemplate-depth">;
  def clang_cc1_f_fp_contract : ClangCC1SeparateValues<"ffp-contract",
["fast", "on", "off"]>;

You may wonder why the parsing info is a dag instead of just being essentially
an enum value as it is in Clang's current implementation. The main reason for
this is that there exist tools with option formats that do not nicely fit into
that model. And in fact have many different ways of representing arguments.

These are actually very simple to convert to C++ code from TableGen. It is also
trivial to merge identical parsers before generating them, which means there's
no code size explosion. Here's an example of what this would generate.

  ArgParseResult parseJoinedOrSeperate(const ArgParseState APS) {
    return parseOr(parseJoined("", parseStr(0)),
                   parseSeperate(parseStr(0)))(APS);
  }

Each parse* function is a template function which creates a function object that
implements that parser with the given arguments. The integer argument for
parseStr tell it which Argument value slot to put it in. This is based on v0
from above.

This is an idea of what transforms would look like:

  def not;

  class Transform<list<dag> match, list<dag> produce> {
    list<dag> M = match;
    list<dag> P = produce;
  }

  include "Clang.td"
  include "ClangCC1.td"

  def : Transform< [(clang_f_strict_enums), (not clang_f_no_strict_enums)])
                 , [(clang_cc1_f_strict_enums)]>;

  def : Transform< [(clang_f_template_depth (str:$v0))]
                 , [(clang_cc1_f_template_depth (str:$v0))]>;
  // Since this case is common, there would probably be a:
  def : Forward<clang_f_template_depth, clang_cc1_f_template_depth>;
  // This would simply copy the Argument values.

  def : Forward<clang_f_fp_contract, clang_cc1_f_fp_contract>;

  def : Transform< [(clang_f_fast_math), (not clang_f_fp_contract)]
                 , [(clang_cc1_f_fp_contract (str "fast"))]>;

For each Transform, each dag in M is matched against the ArgumentList in order.
Once a dag matches an Argument the process continues with the next Argument in
the list. Values are extracted using :$<name>. If all dags in M are satisfied,
the dag in P has its :$<name> values substituted, converted to an Argument, then
added to the output ArgumentList.

Not all transforms can be represented in this manner, but you can still hand
write the code for these casses.

Attached is a patch that adds tools/llvm-cltest. This currently contains code
that should be in a library and will not exist in the final version. This is a
proof of concept for what TableGen would actually generate. It does not contain
the actual TableGen implementation.

- Michael Spencer
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OptionParsing.patch
Type: application/octet-stream
Size: 43723 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120801/e9d31f0c/attachment.obj>


More information about the llvm-dev mailing list