[cfe-dev] RFC: Integrating clang-cc functionality into clang (the driver)

Daniel Dunbar daniel at zuster.org
Tue Nov 3 13:07:28 PST 2009


Hi all,

I've been thinking lately about how we can push forward with our goal of
integrating the 'clang-cc' functionality into the 'clang' executable, so that we
have a single compiler binary. This will also unblock future work on clang APIs,
and hopefully make it easier to support new interesting uses of clang.

Heres my proposal:

--

Goals
--

 1. Make it easier to build clang based tools (from an API perspective).

 2. Avoid unnecessary fork/exec of clang-cc.
    a. Makes it easier to debug!
    b. Make driver / compiler interaction more obviously a private
implementation
       detail.

Non-Goals
--

 1. Add a general purpose mechanism for extending 'clang' (e.g., a plugin
 model). This work will make that easier, however.

Proposal (user level)
--

 1. Driver gets a new option -cc1, which must be the leading argument (after any
 -ccc arguments, but those are "internal" and not supposed to be used by users
 anyway). This is a "mode", the remaining arguments will be processed "like"
 clang-cc arguments. This is just for debuggability, and for use in -v or -###.

 In practice, the arguments will be processed by hand or by reusing the driver
 argument parsing functionality instead of using LLVM's command line library.

 For example, where 'clang' currently does something like:
--
$ clang -S -x c /dev/null -###
...
 "/Volumes/Data/ddunbar/llvm.obj.64/Debug/bin/clang-cc" "-triple"
"x86_64-apple-darwin10.0" "-S" "-disable-free" "-main-file-name"
"null" "--relocation-model" "pic" "-pic-level=1" "--disable-fp-elim"
"--unwind-tables=1" "--mcpu=core2" "--fmath-errno=0" "-fexceptions=0"
"-fdiagnostics-show-option" "-o" "null.s" "-x" "c" "/dev/null"
--
 it would now print:
--
$ clang -S -x c /dev/null -###
...
 "/Volumes/Data/ddunbar/llvm.obj.64/Debug/bin/clang" "-cc1" "-triple"
"x86_64-apple-darwin10.0" "-S" "-disable-free" "-main-file-name"
"null" "--relocation-model" "pic" "-pic-level=1" "--disable-fp-elim"
"--unwind-tables=1" "--mcpu=core2" "--fmath-errno=0" "-fexceptions=0"
"-fdiagnostics-show-option" "-o" "null.s" "-x" "c" "/dev/null"
--
 and that command would actually work when run on the command line.

 The reason for choosing -cc1, is that this is the traditional gcc style name
 for the "compiler" (versus the "driver"), and to make it more obvious that this
 is an "internal" option, not a user level one.

 The initial focus for -cc1 would be to implement the clang-cc options that the
 driver uses, but it would be easy to add support for some additional clang-cc
 modes at the same time (for example, -ast-dump).

 2. 'clang' gets a new option -no-integrated-cc1 which would just execute
 'clang' recursively passing the -cc1 argument. Primarily only for testing,
 users shouldn't have a good reason to use this.

 3. We'll take some steps to still be friendly if clang crashes (currently the
 driver tries to at least print a canonical "error: clang-cc failed" type of
 message).

Proposal (implementation)
--

 1. There will be a new class CompilerInstance (suggestions for a better name
 welcome) which holds all of the state needed for running Clang. That is, this
 will wrap the source manager, the file manager, the preprocessor factory, the
 AST context, the AST consumer, and all that horrible stuff. This will probably
 actually be constructed via a builder.

 2. Internally there will be a CompilerInvocation object which maintains the
 various bits of state that forms a single invocation of clang-cc (include
 paths, target options, triple, code generation options, etc.).

   a. The CompilerInvocation object will have two important methods, the first
   converts the invocation into a list of 'clang -cc1' arguments. The second
   "executes" the invocation and returns a CompilerInstance instance.

   b. The Driver will get a new CompilerJob class which just wraps a
   CompilerInvocation. The Driver's Clang tool implementation will be changed to
   construct an instance of this object instead of constructing a list of
   arguments. This job will take care of running the clang compiler in/out-of
   process depending on -no-integrated-cc1, but otherwise is just an adaptor for
   CompilerInvocation.

   c. There will be a method to turn a 'clang -cc1' argument list into a
   CompilerInvocation object.

 3. The Driver will get a new API for parsing a "gcc-like" argument list which
 corresponds to a single "compile only" task (-fsyntax-only, -S, etc.), and
 returns a CompilerJob. This API will return an error for argument vectors which
 would do something more complicated, for example executing multiple
 compilations or running the linker or assembler.

 4. Move "standard" tests to use 'clang -cc1' instead of 'clang-cc'.

The Future of clang-cc
--

clang-cc is kind of a mess, so at least initially I'd rather just move the
driver and appropriate tests to using the 'clang' executable. Once that's done
we can reevaluate and see what the next step is. One option is to keep clang-cc
around as a dumping/play ground for tools or other features that don't fit into
the "compiler" model of functionality. Another option is to extend 'clang' to
support the main features of clang-cc we care about (i.e., the ones we test) and
move everything else into separate tools (which would probably only be
optionally built -- these would amount to examples).

Impact
--

This redivision of clang/clang-cc and new API hooks open up our architecture in
a few nice ways.

 1. It becomes much easier to implement a Clang based tool which leverages the
 Driver library to provide a gcc-like command line interface.

 The idea is that a client would use the new Driver API to construct a
 CompilerJob, and could then twiddle the CompilerInvocation object or the
 CompilerInstance object to implement their tool (for example, supplying their
 own AST consumer).

 2. We retain some reasonable semantics for -### and -v that closely
match existing
 behavior.

 3. If we desire to keep clang-cc, we should be able to move a large part of its
 internals to using CompilerInvocation and CompilerInstance which should make it
 easier to understand and maintain.

 4. Programmaticly driving the compiler (i.e., implementing a fixed function
 Clang based tool that doesn't need to process a gcc-like command line) should
 be *much* easier. Those clients will have the option of constructing a
 CompilerInvocation object, or using a CompilerInstance object directly.

 5. This will make the connection between the driver and the compiler more
 rigorous, for example the driver will not be capable of passing an option to
 clang-cc that it doesn't understand.

 6. This should make it easier to build new tools which need more information
 about how the compiler is invoked. For example, a long standing wish of mine is
 to add a mode to the driver which will automatically produce test cases, which
 requires knowing how the compiler was invoked, then being able to easily
 manipulate the command line to generate a preprocessed input, eliminate command
 line arguments, reduce optimization level, etc.

Caveats
--

 1. One major caveat is the current use of the LLVM command line library to
 interact with the back end. For example, the driver currently passes options
 like '--relocation-model=pic' to clang-cc. This option isn't actually defined
 in clang-cc, rather it is defined in the LLVM code generator and things work
 out because of how LLVM's command line handling works.

 This is both a wart and a benefit -- its a wart in that its a hidden
 dependency, and it blocks using the API's safely in some contexts (for example,
 from multiple threads). It's a benefit because it provides a generic mechanism
 for twiddling options in the back end for debugging or testing new features.

 My current plan is to not try to solve this problem, but instead support some
 generic argument vector (a list of strings) in the CompilerInvocation object
 which will get passed to LLVM command line parsing library when the invocation
 is executed. We should endeavor to never use that mechanism for any features
 that matter, but this requires us to add proper API mechanisms for setting
 things like the relocation model.

 2. Chime in!

--

Comments?

 - Daniel



More information about the cfe-dev mailing list