[cfe-dev] Reusing CompilerInstance (parsing separate expressions)
nsf
no.smile.face at gmail.com
Thu May 26 20:21:19 PDT 2011
Hi. I'm using clang for my own programming language as a C header
importer. The problem is that C programmers tend to use preprocessor
for defining constants in their libraries instead of enums and in order
to import that information from C header I need to parse a macro
definition and see if it's a constant expression and can be evaluated
to something.
It's easy to accomplish it if the macro is something simple like:
#define BYTE_MAX 0xFF
in that case I can simply take that single token and feed it to
NumericLiteralParser, but when it comes to bit masks and various
complex expressions like:
#define BIT1 (1 << 0)
#define BIT2 (1 << 1)
#define BIT3 (1 << 2)
I need to use a complete parser here. So, I simply mocking up a
function with a single expression statement and then feed it to
ParseAST using previously created CompilerInstance. And using custom
ASTConsumer I find my expression and do Evaluate.
(And all that achieved using PPCallbacks of course).
It all works just fine, I tested it on SDL/SDL.h, which contains about
700 numeric definitions (including all these garbage like
HAVE_STDIO_H). And about 350 of them are complex (more than one token).
Anyways, the problem is that it is slow. Creating a new
CompilerInstance each time and parsing just a single expression is
inefficient I guess. It takes about 400ms for SDL/SDL.h (without such a
constant macro extractor it takes about 60ms for SDL/SDL.h and all its
declarations). Applying hack that checks out if macro contains a single
token and the token is tok::numeric_constant gives speed up to about
260ms for full SDL/SDL.h parsing.
The question is: is it possible to reuse CompilerInstance in that
scenario? Or maybe there is another good way to accomplish what I'm
trying to do?
That's how my MacroDefined hook looks like:
void MacroDefined(const Token &name, const MacroInfo *mi)
{
// we support only zero arg macros, which are constant
// definitions, most likely
if (mi->getNumArgs() != 0)
return;
if (mi->tokens_empty())
return;
// for some reason mi->isBuiltinMacro() doesn't work
if (strcmp("<built-in>", srcm->getPresumedLoc(mi->getDefinitionLoc()).getFilename()) == 0)
return;
if (mi->getNumTokens() == 1 && mi->tokens_begin()->getKind() == tok::numeric_constant) {
// TODO: use NumericLiteralParser
const Token *tok = mi->tokens_begin();
llvm::StringRef nm = name.getIdentifierInfo()->getName();
printf("const %.*s = %.*s;\n", nm.size(), nm.data(),
tok->getLength(), tok->getLiteralData());
return;
}
MacroInfo::tokens_iterator first, last;
first = mi->tokens_begin();
last = mi->tokens_end() - 1;
SourceLocation beg = first->getLocation();
SourceLocation end = last->getLocation();
const char *s = srcm->getCharacterData(beg);
const char *e = srcm->getCharacterData(end) + last->getLength();
tunit.clear();
cppsprintf(&tunit, "void foo() { %.*s; }\n", e-s, s);
CompilerInstance ci;
ci.getTargetOpts().Triple = llvm::sys::getHostTriple();
ci.createDiagnostics(0, 0);
ci.setTarget(TargetInfo::CreateTargetInfo(ci.getDiagnostics(),
ci.getTargetOpts()));
ci.getDiagnostics().setSuppressAllDiagnostics();
ci.createFileManager();
ci.createSourceManager(ci.getFileManager());
ci.createPreprocessor();
ci.createASTContext();
llvm::MemoryBuffer *mb = llvm::MemoryBuffer::getMemBuffer(tunit, "macrodef.c");
ci.getSourceManager().createMainFileIDForMemBuffer(mb);
ConstantExprExtractor consumer;
consumer.ctx = &ci.getASTContext();
ParseAST(ci.getPreprocessor(), &consumer, ci.getASTContext());
llvm::SmallVector<char, 128> tmp;
// done, let's see what we got here
APValue &v = consumer.er.Val;
switch (v.getKind()) {
case APValue::Int:
v.getInt().toString(tmp);
break;
case APValue::Float:
v.getFloat().toString(tmp);
break;
default:
break;
}
if (!tmp.empty()) {
llvm::StringRef nm = name.getIdentifierInfo()->getName();
printf("const %.*s = %.*s;\n", nm.size(), nm.data(), tmp.size(), &tmp[0]);
}
}
More information about the cfe-dev
mailing list