[LLVMdev] Proposal for TableML, llvmc2 configuration language

Wed Nov 26 13:34:04 PST 2008

Hi,

I've been working on a proof of concept for a new configuration language 
for LLVM: specifically for my needs in llvmc2, but I have tried to make 
it as generic as possible for use throughout LLVM if other projects 
would like to make use of it. It's a compiler that compiles a 
near-subset of Standard ML to C++, with an architecture deliberately 
very similar to TableGen.

The code is not yet ready to be merged by any means - it has many 
failure cases and may not compile at any given time - but I thought that 
before I go further I should send a proposal to the list. The WIP code, 
for the curious, is here:

http://github.com/pcwalton/llvm-nw/tree/miniml

If TableGen is a language that allows users to specify records of 
domain-specific information, TableML is designed to be a configuration 
language that is designed to be allow users to specify how to 
*construct* records of domain-specific information. TableML has a plugin 
architecture in which at any given time one of several backends is in 
use, just as in TableGen. The backends specify one or more record types 
and definitions. TableML then reads a configuration file, evaluates the 
definitions, and passes the results to the backend for serialization.

For instance, we might have a RegisterInfo backend that declares a 
definition of "RegisterNames : string list". Then we could have a 
TableML input file like this:

def val RegisterInfo = [ "eax", "ebx", "ecx", "edx" ]

Or we could have a more complex one that performs computation to produce 
the result.

val make32bit = (fn x => strcat("e", x))
def val RegisterInfo = map make32bit [ "ax", "bx", "cx", "dx" ]

Obviously, this example is somewhat contrived, but it's just to 
illustrate that arbitrary computation is allowed (and is performed at 
compile time), as long as the definitions end up with the correct types. 
This could be thought of as a generalization of the "class" and 
"multiclass" concepts in TableGen. Also notice that, like all ML-based 
languages, TableML is strongly typed, and it makes heavy use of 
Hindley-Milner type inference. (The parser, lexer, and typechecker are 
all coded already, by the way, just not very well tested at the moment.) 
The subset of Standard ML that TableML supports is essentially the one 
shown here:
http://www.macs.hw.ac.uk/ultra/compositional-analysis/type-error-slicing/slicing.cgi

Now the upshot of this for the compiler driver is that function types 
are acceptable types for definitions. This means that, unlike TableGen, 
backends that want to allow scripting (which is currently just llvmc2) 
don't have to define their own programming languages. Instead, they can 
simply request a definition with a function type (e.g. SomeFunction : 
int -> int). TableML will hand the AST for the function, as well as its 
values, over to the backend for emission as C++ code. The backend is 
free to generate any C++ code it wants for the typed ASTs (of course, 
some support routines could be added to the base to make this easier).

So, in summary, there are two main benefits to TableML that I see, 
depending on the backend/use case:
(1) Users of backends that don't need scripting support can benefit from 
  arbitrary computation in order to express the records, more than the 
macro facility that TableGen provides.
(2) Users of backends that do need scripting support don't have to 
define their own programming languages, without any run-time performance 
loss when compared to TableGen.

I'd definitely appreciate any comments on this proposal! I'd also be 
happy to clarify any issues with this explanation.

Patrick