[cfe-dev] [LLVMdev] LLVM targeting HLLs

Mon Jan 24 14:35:29 PST 2011

On 24 Jan 2011, at 22:04, Eric Christopher wrote:

> On Jan 24, 2011, at 2:01 PM, David Given wrote:
> 
>> I am interested in using LLVM to translate C and C++ into high-level
>> language code. (As an update to an earlier project of mine, Clue, which
>> used the Sparse compiler library to do this: it targets Lua, Javascript,
>> Perl 5, C, Java and Common Lisp, with a disturbing amount of success.
>> See http://cluecc.sourceforge.net for details.)
>> 
>> The obvious place to start on this is the C backend, except in these 2.8
>> days the C backend is so hedged about with caveats I'm rather wary of
>> basing anything on it. I also recall seeing comments here that it's due
>> for a rewrite from scratch, and that various people were looking into
>> it. Can anyone go into more detail as to what exactly is wrong with the
>> C backend, and whether this rewrite is happening?
>> 
>> The other thing I could do is to use the LLVMTargetMachine and treat my
>> HLL as a low-level machine; this gets me a certain amount of good stuff
>> like register allocation and more optimisations, but the documentation
>> is still pretty basic (e.g.
>> http://wiki.llvm.org/Absolute_Minimum_Backend is three short paragraphs)
>> and I'm not certain as to whether LLVMTargetMachine is suitable. For
>> example: my HLL can largely be treated as a register machine with an
>> arbitrary number of registers. Can LLVMTargetMachine handle this?
> 
> You could create a different code generator from clang or use the rewriting
> machinery?

-- Send from my Jacquard Loom

A better approach would probably be to use Clang's CodeGen lib as inspiration, and write an equivalent that emitted your high-level language code instead of LLVM IR.  For example, consider C++ classes:

When you convert these to LLVM IR, you lose all of the information about them other than their structure, and the vtable is explicitly created for the target ABI.  Mapping them to something like JavaScript, you'd actually want to create a new prototype object for each class, with one slot for each field and another slot for each method (and some extra mixin-style stuff if you wanted to support multiple inheritance).  

The same is true even for pure C structures - you'd want to represent these as objects with named fields.  This information is in the Clang AST, but it's lost by the time you get to LLVM IR.  Taking an example from Apple's Foundation framework, you have two structures:

typedef
{
	CGFloat x, y;
} NSPoint;

typedef
{
	CGFloat width, height;
} NSSize;

In LLVM IR, these are both something like {double, double}.  In JavaScript, you'd probably want something like:

function NSPoint()
{
	this.x = 0;
	this.y = 0;
}
function NSSize()
{
	this.width = 0;
	this.width = 0;
}

This is pretty simple to generate from the Clang AST, but will be a huge amount of effort to generate from LLVM IR.

David

-- Sent from my PDP-11