[cfe-dev] RFC: Clang driver redesign

James Molloy james.molloy at arm.com
Fri Nov 4 04:11:37 PDT 2011


Hi,

The clang driver has been the subject of much dislike for a while now, and
for a while I've wanted to sort it out. I'm chairing a BoF session on it a
the dev meeting later this month, and as a precursor to that I've started
putting together a requirements document.

The initial aim of this document is to work out the specific requirements
for a Clang compiler driver. Then, we can see if the current driver doesn't
meet them (it won't!) and design a solution based on actual use cases.

The first draft of this document is attached in HTML form and inline below
in ReST form. I've documented the obvious use cases that I can think of, but
I am certain to have missed some out. I've also started a strawman proposal
solution, with the intention of stimulating discussion by providing
something concrete.

Please comment! The more feedback I get (even "I don't really care" is
useful) the better prepared I can be for the BoF and the better driver Clang
will get as a result.

Cheers,

James

===========================
 Clang driver requirements
===========================

Changelog
=========

2011-Nov-04: JamesM: Updated usecases, s/driver/plugin/g, sent to list.
2011-Oct-25: JamesM: Initial draft.

Introduction
============

The current Clang driver is inadequate, as shown by many mailing list
threads, the latest being [ML1]_. Its failings seem to stem from the
lack of real extensibility without touching lots of the codebase, and
many interlinked special cases.

The current driver, while OK for hosted compilation is very difficult
to set up for cross-compilation.

This document aims initially to merely document the requirements for
*a compiler driver for an Clang/LLVM based compiler*, specifically aimed at
but
not limited to the Clang codebase - that is, it should be possible to
reuse this driver component for a similar but different tool, much
like the rest of Clang's design.

This document aims at building/documenting a consensus on driver
design. There are many different use cases in the community so this
design is expected to be a living document which will have input from
the entire (interested) community.

Potential use cases
===================

Where these use cases spawn out requirements, these are indexed in
square brackets ([X]).

Usecase 1: UNIX/Windows distribution maintainer creating a distribution
-----------------------------------------------------------------------

Wants to take Clang ToT or release tag, compile and package. The only
change he needs to make is to tell Clang where to pick up headers and
libraries. This differs between distributions, and with multilib
support it has started differing even more. This is due to differing
directory structures for multilib and the expectation of where to find
headers, libraries and crt*.o's. The ideal would be to not
have to recompile Clang or maintain diffs against the clang tree to
make this change [2].

The library and header locations may depend on the resolved target
triple, along with other possible parameters (command line options,
environment variables, etc.).

Usecase 2: Developer creating a Clang-based derivative tool
-----------------------------------------------------------

Researchers or companies may wish to adapt Clang in more complex ways
than Usecase 1, such as adding new command line flags, adding
compatibility modes or changing the subtools invoked (for example,
invoking an alternate linker which takes different command line arguments).

Wants to adapt the command line parsing in a more complex way -
perhaps by ([3]):

  * Adding new command line flags.
  * Editing the functionality of current command line flags, possibly
    in a complex (non-declarative) way.
  * Altering the way subtools are invoked.

These changes should be easy to maintain - diffs should be able to be
separate from the main Driver and not subject to clobbering by
enthusiastic Driver developers [4].

Usecase 3: Clang developer, developing
--------------------------------------

Wants no functionality to change - things keep working as normal [1]

Usecase 4: Apple/Darwin developer, using fat binaries
-----------------------------------------------------

Requires fat-binary support. This entails multiple "-arch" arguments
being supported. [1]

.. note:: 

    Describe this some more?

Functional requirements
=======================

The following requirements follow from the use cases above and attempt
to formalise those use cases more precisely.

[1] No functional regressions
  The driver **must** be able to be configured such that it can parse
  command lines that the current Clang driver accepts. The driver
  **must** invoke all subtools in the same manner as the current Clang
  driver, with the possible exception of obtuse, undefined, legacy or
  otherwise incorrect behaviour, permission for which must be obtained
  from the mailing list and documented in a subsection of this
  document for decision tracking.

[2] Adaptability
  The driver's parameters (search paths, header locations etc)
  **must** be able to be changed with minimal intervention.

  These parameters **should** be able to be changed without a
  recompile of Clang or any changes to the source base.

[3] Extensibility
  All parts of the driver that are to interact with outside
  environment (such as interpreting command lines and launching
  subtools) **must** be able to have their behaviour easily modified.

  While there is no requirement for this to be able to be done with no
  source changes, there **could** be scope for allowing dynamically
  loadable modules (in the spirit of ``opt -load``) to change the
  driver's behaviour at invoke-time.

[4] Maintainability of downstream changes
  There must be a highly defined and documented API that can be
  followed by a developer attempting to modify the driver's
  behaviour. This API should make it possible to, at a minimum:

    * Add, remove and modify command line flags with possibly complex
      imperative rules.
    * Hook into and mutate commands to be passed to subtools.
    * Pass "Clang-like" diagnostics to the user.
    * Set up default parameters such as include paths.
    * Separate their modification from the Clang sourcebase, at least
      to the extent that existing Clang source files should not need
      to be modified with anything other than a trivial patch.

Proposed design "A"
===================

The following design is proposed as a strawman to expose possible
flaws in the requirements and to generate more targetted
discussion. That said, I (JamesM) certainly think it's a decent
design, else I wouldn't propose it.

As the requirements change (and they will!) this design should change with
them.

The high level overview is of three parts: a driver "Kernel", one or
more "Driver Plugins" and a "Config file".

.. note::

    Find a better word than "plugin" so it won't cause confusion
    between this and normal Clang plugins?

This solves the requirements in the following ways:

1. As a pure rearchitecting exercise there should be no need for any
   functional differences to take place.
2. The config files allow for easy tweaking of configuration
   without a recompile.
3. The driver framework forms a stable API for adding to and defining
   driver functionality, and also easily allows this to be imported at
   runtime via shared object.
4. The driver framework means that ideally a developer can create
   his/her own driver, plug it in and it not be affected at all by any
   Clang change other than the driver API.

Driver Kernel
-------------

The driver Kernel will be responsible for handling user input and
calling subprocesses. In particular it will parse command lines in a
generic, almost POSIX-compliant manner, launch subprocesses and
display their diagnostics, and emit diagnostics of its own.

It will maintain a list of plugins which are partially ordered (so that the
order they are linked in / loaded is not important), which will each
expose a list of command line options they handle, and how to handle
them (for example, setting a parameter or handing off to a handler
function).

The kernel should maintain a state as a sort of dictionary/hash which 
the plugins can access and mutate, as well as add extra entries to.

Once all plugins have mutated the state due to command line options,
the state is handed to the adapter file to mutate further, after which
the kernel generates command lines to invoke subprocesses.

These command lines are then sent to the plugins for possible mutation
before being executed.

Driver Plugin
-------------

A driver plugin is a C++ module, either statically or dynamically
linked, which implements a specific API.

The API should consist of at least:
  
  * A function for the Kernel to obtain a list of command line options
    the driver can handle and how to handle them. This is yet to be
    defined, but should be nearly (may require a few changes)
    compatible with the current output generated by the Clang argument
    parser tablegen backend.
  * A function for the Kernel to obtain the "priority" of the
    driver. Priority is a way for the developer to define which
    drivers are queried first for command line argument resolution and
    subprocess command mutation.
  * A function that the Kernel will call on every subprocess
    invocation to allow the plugin to mutate that invocation.
  * A function to allow the plugin to emit diagnostics to the user via
    the Kernel, or to abort compilation.

Statically linking plugins to the driver will result in the fastest
compilation speed, but because the API is so defined I suggest
offering the ability to dynamically load plugins at runtime - this may
possibly make development easier for some users (TODO: Does anyone
care about this? Would it help anyone?)

Driver Config files
-------------------

These are designed to allow the *user* to tweak settings at invoke
time, without requiring a recompile. For speed, they allow a
restricted set of operations in comparison to driver plugins - they
are pure declarative with no imperative constructs and can modify or
add to the kernel state.

These files could be written in whatever lightweight markup language
we choose, which is not really important at this stage. The important
thing is that it is simple enough to parse speedily with no interpret
overhead and no extra dependencies.

Suggestions include JSON, YAML, XML or a INI style, similar to Daniel
Dunbar's recent build system changes.

References
==========

.. [ML1] http://lists.cs.uiuc.edu/pipermail/cfe-dev/2011-October/018059.html

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20111104/b1c00f4f/attachment.html>


More information about the cfe-dev mailing list