[LLVMdev] Compiler Driver Requrements & Design (Comments Solicited!)

Wed Jul 28 09:33:19 PDT 2004

LLVMers,

As part of my work on bug 353: Create Front End Framework And Compiler
Driver (http://llvm.cs.uiuc.edu/PR353), I'm starting a discussion on the
design and requirements of the compiler driver. If you have comments on
this, by all means PLEASE chime in. This is by no means cast in stone.
The results of the ensuing discussion will be documented in PR353 (and
elsewhere) and I'll use it as my guide in implementing the compiler
driver.

If your comments are limited in scope, please place the section number
and title in the subject line so we can have independent lines of
discussion on sub-topics. Thanks.

CONTENTS:
=========
 1. What it is
 2. Mode of operation
 3. Naming 
 4. Similar options as GCC.
 5. Basic/Standard compilation tasks.
 6. Recognize file types by their extensions
 7. Input/Output Flexibility
 8. Configurable for a variety of tasks/languages
 9. Source language/tool agnostic
10. Standard levels of optimization
11. Pipes or temporary files.
12. Automatic linkage
13. Automatic runtime library support
14. Integration with front end framework.
15. Next steps

1. WHAT IT IS
=============
The compiler driver is a program that will execute a variety of
compiler, linking and optimization tools. The basic idea is that it
provides an engine for transformation between files of different types.
The compiler driver offers a standard set of command line options for
specifying the transformations needed and the ability to invoke other
programs (tools) to perform those transformations. The driver itself
doesn't do anything with the files, its just a master invoker of other
programs.

2. MODE OF OPERATION
====================
The driver will simply read its command line arguments, read its
configuration data, and invoke the compilation, linking, and
optimization tools necessary to complete the user's request. Its basic
function is somewhat like a SQL query optimizer in that it tries to find
the optimal strategy for executing the user's request given the current
situation. It is given a high level description of what to do (the
command line arguments, akin to the SQL statement), and a detailed
description of the tools that can be invoked to accomplish the request
(configuration files, akin to the database layout/parameters). From
these two inputs, it generates a (hopefully optimal) strategy for
accomplishing the request with as little program invocation and I/O as
possible. It then executes that strategy and terminates.

3. NAMING
=========
We want to have a really great name for this tool since it will
(eventually) be the main touch point for users of LLVM. The name has not
been settled, but a few have been suggested:

9iron - as in a golf driver
lcd - LLVM Compiler Driver
ccd - Configurable Compiler Driver
ngc - Next Generation Compiler
llvm - essentially, the gateway to LLVM based tools.
myc - My Compiler

These were contributed by Reid, Chris and Misha. If you have thoughts on
this, please voice them!

4. SIMILAR OPTIONS AS GCC
=========================
Certain common GCC options should be supported in order to make the
driver appear familiar to users of GCC. In particular, the following
options are important to preserve:

-c Compile and assemble file to object
-S Compile a file to assembly
-O The optimization family of options
-x Specify the language of a source input
-v Show what the driver is doing as its doing it
-g Include debugging info in the output (passed to tools)
-f Support for optimization/language tweaking (passed to tools)
-m Specify the machine to generate code for (passed to tools)
-W Pass arbitrary other options to tools.
-X Pass argument to assembler, compiler, or linker

Additionally, we should have options to:
* generate analysis reports ala the LLVM analyze tool
* have a "no op" mode like -v where it just reports what it would do
* have a language specific help utility based on suffixes. For example,
  --help ll would list the options applicable to *.ll input files. This
  would extend to source languages too (e.g. --help c for C help or
  --help f for FORTRAN help). The generated help info would be specific
  for the given language, after the config files have been read thus
  allowing the output to vary depending on the driver's configuration.
* Support the -- option to terminate command line options and indicate
  the remaining options are files to be processed. This 
* Support command line configuration (override config files on the
  command line) either by specifying a config file or using special
  configuration options.
* each option should have short (-X) and long (--language) variants

5. BASIC/STANDARD COMPILATION TASKS
===================================
The driver will perform basic tasks such as compilation, optimization,
and linking. The following definitions are suggested, but more could be
supported.

-c|--compile 
  Goal: Compile source to object
  Inputs: Source language (e.g.: .c,.st,.cpp,.f,.p,.java,.ll)
  Outputs: Objects (e.g.: .bc, .o, .c)

-S|--assemble
  Goal: Compile source to assembly
  Inputs: Source language (e.g.: .c,.st,.cpp,.f,.p,.java,.ll)
  Outputs: Assembly (e.g.: .s, .ll)

--link
  Goal: Create executable program
  Inputs: Source, Assembly, Object, Library, Bytecode
  Outputs: Native executable or lli wrapper 

-z|--analyze
  Goal: Analyze program
  Inputs: Source/Assembly/Bytecode
  Outputs: various (loadable) reports on the inputs

In particular, these options specify goals to be satisfied. The driver
should compensate for a given tool's lack of features in order to
satisfy the goal. For example, suppose a Scheme front end simply
generated .ll files but the command line was:

driver -c -o myprog.o myprog.ss

This tells the driver to compile myprog.ss (scheme input) to a native
object file, myprog.o. The driver would "backfill" the tool by running:

1. scheme front end (.ss -> .ll)
2. llvm-as (.ll -> .bc)
3. llc (.bc -> .s)
4. gas (.s -> .o)

Or some optimization of the above sequence.

6. RECOGNIZE FILE TYPES BY EXTENSIONS
=====================================
In general, the driver will classify its input files (command line
options not preceded by -) by their extensions. This will generally
indicate the transformations necessary to be applied to the file in
order for the task to be completed. Additionally, the user may use the
-X option to force a given file to be classified differently than the
default derived from its file extension.

7. INPUT/OUTPUT FLEXIBILITY
===========================
Front end compiler tools (those that translate a given source language
into something the driver can work with) will come in a variety of
flavors and perform a variety of tasks. Indeed, there must be no
requirements placed on the front end compiler tools by the driver. The
*only* requirement is that the tool be invokable with command line
arguments.

The driver tool is not expected to do anything but invoke other tools,
so it needs to understand how to invoke a tool, what optimizations the
tool supports, and what the output of that tool is.  Let's take stkrc as
an example. stkrc generates verbose, unoptimized byte code. It cannot
generate LLVM assembly, native assembly, or native object files.
Consequently, the driver would make up for its shortcomings by passing
the .bc files to opt or llc in order to get optimizations done and to
generate assembly, CBE or native code. More aggressive front ends (such
as the C front end) should be able to optimize their results both
specifically for the source language (e.g. directly at the AST level)
and with the help of LLVM (its various passes). They should also
directly support compilation to a variety of output formats (CBE, BC,
native .o, etc.). 

This approach provides a high level of flexibility while retaining
performance where it is needed. Simple front ends (like stkrc) can be
coded quickly and the driver can "back fill" the necessary optimization,
code generation and linking capabilities. As a front end matures and
takes on the burden of optimizing and various output formats,
performance will increase because fewer llvm tools will need to be
invoked in order to complete a given task. 

Flexibility should be supported on the output side as well. Regardless
of the output of a given tool, the driver should support generation of
LLVM assembly (.ll), LLVM byte code (.bc), native assembly (.s), and
native object files (.o) when compiling. When linking, it should be able
to generate native executables (.exe, a.out, ELF, whatever's supported),
lli wrapper scripts, and C Back End.

8. CONFIGURATION FILES
======================
In order to support the flexibility described above, it must be possible
for an existing compiler tool to be invoked by the driver without
changing either the tool or the driver. This is a firm requirement to
increase the drver's flexibility. 

Consequently, a set of configuration data is needed by the driver in
order to know how to invoke the tools, what they do when invoked, and
what kind of output they create. A simple textual format is envisioned
that describes this information. The configuration files should be read
from standard locations (e.g. /etc/llvm/*), installation locations (e.g.
/usr/local/mycompiler/llvm/*) and user-specific locations (e.g. 
~/.llvm/*.conf).

Configuration files will play a large part in defining what the driver
does. Configuration files will form a cascade of definitions much like a
unix shell does: files in standard locations, installation locations,
and user-specific locations are read successively in a well-defined
order. Files read later override definitions in files read earlier. The
driver will also have built in configuration information for the LLVM
tool set it is based on so that info needed in every environment doesn't
need to be read from configuration files (something akin to make's
default rules). Each source language can provide a configuration file
for the front end compiler for that language. Users can override any/all
definitions to make the driver do what they want it to.

It is unclear what form the configuration files should take. The SPEC
format used in gcc.c is unintelligible and will be avoided. Some of the
ideas so far are:

* XML based (pro: well-structured, con: verbose)
* Java properties style (pro: familiar, con: not structured)
* Window .ini style (pro: familiar, con: not well-structured)
* Special Language (pro: perfect fit, con: new language)

Things to include in the configuration files are:

* command line options supported by a compilation tool and their
   meanings for invocation by the driver.
* language specific command line options that should be supported 
  on the driver's command line but simply passed through to the 
  compilation tool.
* file suffixes supported as input by a compilation tool
* file suffixes supported as output by a compilation tool
* for each input suffix, a description of what the tool expects as
  input.
* for each output suffix, a description of what the tool produces as
  output, how much optimization it does, etc.
* tool chain definitions required for implementing a compiler. 
  For example, stkrc generates non-optimized bytecode files. Its tool
  chain might look like: stkrc | opt | llvm-link | llc | gcc (this
  is obviously a gross oversimplification).
* Runtime libraries needed by a front end (might vary with compilation
  options, e.g. thread support or not).

An optimization of the config files would cache the config data for a
given user in their ~/.llvm directory for faster reading of the config
files. Only if the config files have a time stamp later than the cache
file will the config data be re-parsed. Its not expected that this
optimization would appear in the first version of the driver.

9. SOURCE LANGUAGE/TOOL AGNOSTIC
================================
The driver must be agnostic towards source languages and their
compilation tools. It is expected that a myriad of source languages will
be constructed using LLVM tools, however, that shouldn't be a
constraint. A given source language compiler might be written in Scheme,
Haskell, ML, or assembler and use none of the LLVM libraries or tools.
At best it might generate LLVM Assembly (.ll). The LLVM driver shouldn't
care. As long as the compiler is invokable via command line arguments,
it should be supported. Configuration files will detail what arguments
to use, and what is produced by the compiler.

Furthermore, it must be possible to invoke native compilers (like gcc or
Intel C++ compiler, or Visual C++) from the driver and incorporate their
results into the linkage of LLVM based programs.

10. STANDARD LEVELS OF OPTIMIZATION
===================================
The -O family of options to the driver should be standardized by the
driver across all languages so that common levels of optimization can be
expected when using the driver.  The following definitions for the
various -O options are currently suggested:

-On - do no optimizations except, perhaps, mem2reg
-O0 - do simple, quickly executing optimizations including mem2reg,
        simplifycfg, instcombine
-O1 - More aggressive optimizations, including gcse, sccp, scalarrepl
-O2 - Loop optimizations, IPO at compile-time, etc.
-O3 - Link-time optimization, aggressive analysis
-O4 - Run-time, profile guided optimizations

To extend this list, we might want to have "basic" and "aggressive"
optimizations at various levels: functions, globals, modules, link-time
(IPO), run-time.  A certain amount of thought needs to go into this in
order to get the correct set of definitions. Ideas welcome.

11. PIPES OR TEMPORARY FILES
============================
The user should have the option of passing output between the
compilation tools via either pipes or temporary files. Depending on the
system and hardware, one or the other should provide for fast execution
of the tool chain.

12. AUTOMATIC LINKAGE
=====================
Byte code files were recently enhanced to encode their dependent   
libraries. When the driver is linking a program, it should use the 
dependent library information in the .bc files to build the link 
command line so that users never have to worry about getting the correct
set of libraries to link with. This should work equally well for byte
code libraries as well as user and system native libraries.

This feature implies some intelligence in the front ends. Front ends
written specifically for LLVM (generating byte code or LLVM assembly)
should support the dependent libraries feature. Other compilers (native)
will not have this feature. A pre-processor might be able to derive the
dependencies or we just let the link fail. Ideas on how to support this
feature for non-LLVM tools would be welcome.

13. RUNTIME LIBRARIES
=====================
Runtime libraries will come in  various forms: native system libs,
native user libs, LLVM standard runtime libraries (e.g. crtend.a), LLVM
language component libraries (e.g. GC/threads), language specific
runtime libs (e.g. libg++, libstkr_runtime). The driver must support
linking against all these different types of libraries. In particular,
it must be possible to configure the driver so that it knows where to
find the runtime libraries needed for a given program.

13. INTEGRATION WITH FRONT END FRAMEWORK
========================================
Eventually, LLVM will provide a framework for front end compilers. This
will essentially be a toolkit of code to make it easier to implement a
front end. While the design of this framework is only loosely sketched,
we can currently make the requirement that the driver should support
close integration with front ends based on the framework. 

The initial release of the FE Framework is already envisioned. It will
support the command line options needed to support invocation by the
driver and take care of all the "back end" details such as pass
invocation and generation of code (in various forms). 

Subsequent releases of the FE Framework will possibly include:
* AST construction helpers
* support for garbage collection
* support for threading
* support for LLVM debugging
* providing a complier as a loadable module so that fork/exec isn't
  needed by driver (with very minimal interface between them).

Whatever it turns out to be, the driver needs to integrate with it.

NEXT STEPS
==========
1. Gather feedback from this email.
2. Document driver tool command line as a .pod file in the
    llvm/docs/CommandGuide directory. Submit for review and incorporate
    feedback.
3. Document driver tool requirements, design, config file language, and
    other tools based on input in a .html file in llvm/docs. Submit for
    review and incorporate feedback.
4. Incorporate design (by reference) to bug 353.
5. Code driver to specifications previously documented.
6. Generate test cases for the driver and test it.
7. Submit driver code and tests for review and incorporate feedback.
8. Commit driver to LLVM CVS
9. Write an initial "front end framework" for making it easier to write
    driver compatible front ends. This would basically support all the
    back end plumbing necessary to recognize optimizations (-On) and
    different kinds of output (.s, .ll, .bc, .o)
10.Retrofit stkrc to use the initial front end framework so it can be
    less brain dead and actually generate optimized code. Perhaps do the
    same for BF.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20040728/b29946eb/attachment.sig>