[cfe-dev] clang, unknown identifiers, and ahead of time compilation

Wed Sep 1 12:54:02 PDT 2010

Hi,

let's try again with a hopefully clearer description of what we want to
do :-) Apologies for the lengthy email, it's hopefully justified by the
complexity of the problem.

Our future interpreter will need to support variables that are defined
from some context at runtime. They are unknown at compile time. Like this:

int f() {
   int ret = 0;
   ret = h->D(); // h is unknown at compile time
   return ret
}

Our plan is to "escape" them into a runtime invocation of sema (and
actually, runtime codegen), sort of like this:

void f() {
   int ret = 0;
   float arg =42.
   {
     DelayedCompiler::Context __ctx; __ctx.add("arg", arg);
     ret = DelayedCompiler::Compile("h->D(arg);", __ctx);
   }
   return ret;
}

where DelayedCompiler::Compile() is the invocation of the compiler at
runtime, which will compile the argument given some context. So we are
actually talking about ahead-of-time compiled f() with a runtime
compiled "ret = h->D();".

We found that Sema::ActOnIdExpression() is a function we like. It knows
when a lookup fails ("h"), and it allows us to "escape" the expression
by converting it into DelayedCompiler::Compile("h"), returning an
ExpressionResult of our choice.

The problem is that we need to escape the whole statement, and as such
ActOnIdExpression() is both insufficient and at the wrong level.

I see two options (and I am positive that I overlooked some :-):

1. extract the statement from the parser, walk up the AST to the
statement node, and replace it with our context building and
DelayedCompiler::Compile() invocation using the statement as the parser
saw it. This will probably involve some re-invocation of the lexer etc,
to analyze the statement and extract the necessary context.

2. poison the node, and use (add?) a mechanism to build a poisoned AST
statement from poisoned expression nodes. I.e. if a child node is
poisoned then the parent node is poisoned, too - but at least the AST
builds. Then in a second pass we can replace the poisoned statement by
our invocation to DelayedCompiler::Compile() - but this time we have the
parsed nodes already (yes, no valid AST, but lexed source that we can
use to build the DelayedCompiler::Compile() invocation).

If it helps, we know that all the unknown ids are pointers, and we can
limit the use cases we support (e.g. the candidate list for overload
resolution must have exactly one entry etc).

Any comments, recommendations, ideas?

Cheers, Axel.