[cfe-dev] Reusing CompilerInstance (parsing separate expressions)

nsf no.smile.face at gmail.com
Thu May 26 20:21:19 PDT 2011


Hi. I'm using clang for my own programming language as a C header
importer. The problem is that C programmers tend to use preprocessor
for defining constants in their libraries instead of enums and in order
to import that information from C header I need to parse a macro
definition and see if it's a constant expression and can be evaluated
to something.

It's easy to accomplish it if the macro is something simple like:
  #define BYTE_MAX 0xFF

in that case I can simply take that single token and feed it to
NumericLiteralParser, but when it comes to bit masks and various
complex expressions like:
  #define BIT1 (1 << 0)
  #define BIT2 (1 << 1)
  #define BIT3 (1 << 2)

I need to use a complete parser here. So, I simply mocking up a
function with a single expression statement and then feed it to
ParseAST using previously created CompilerInstance. And using custom
ASTConsumer I find my expression and do Evaluate.

(And all that achieved using PPCallbacks of course).

It all works just fine, I tested it on SDL/SDL.h, which contains about
700 numeric definitions (including all these garbage like
HAVE_STDIO_H). And about 350 of them are complex (more than one token).

Anyways, the problem is that it is slow. Creating a new
CompilerInstance each time and parsing just a single expression is
inefficient I guess. It takes about 400ms for SDL/SDL.h (without such a
constant macro extractor it takes about 60ms for SDL/SDL.h and all its
declarations). Applying hack that checks out if macro contains a single
token and the token is tok::numeric_constant gives speed up to about
260ms for full SDL/SDL.h parsing.

The question is: is it possible to reuse CompilerInstance in that
scenario? Or maybe there is another good way to accomplish what I'm
trying to do?


That's how my MacroDefined hook looks like:

void MacroDefined(const Token &name, const MacroInfo *mi)
{
	// we support only zero arg macros, which are constant
	// definitions, most likely
	if (mi->getNumArgs() != 0)
		return;

	if (mi->tokens_empty())
		return;

	// for some reason mi->isBuiltinMacro() doesn't work
	if (strcmp("<built-in>", srcm->getPresumedLoc(mi->getDefinitionLoc()).getFilename()) == 0)
		return;

	if (mi->getNumTokens() == 1 && mi->tokens_begin()->getKind() == tok::numeric_constant) {
		// TODO: use NumericLiteralParser
		const Token *tok = mi->tokens_begin();
		llvm::StringRef nm = name.getIdentifierInfo()->getName();
		printf("const %.*s = %.*s;\n", nm.size(), nm.data(),
		       tok->getLength(), tok->getLiteralData());
		return;
	}

	MacroInfo::tokens_iterator first, last;
	first = mi->tokens_begin();
	last = mi->tokens_end() - 1;

	SourceLocation beg = first->getLocation();
	SourceLocation end = last->getLocation();

	const char *s = srcm->getCharacterData(beg);
	const char *e = srcm->getCharacterData(end) + last->getLength();

	tunit.clear();
	cppsprintf(&tunit, "void foo() { %.*s; }\n", e-s, s);

	CompilerInstance ci;
	ci.getTargetOpts().Triple = llvm::sys::getHostTriple();
	ci.createDiagnostics(0, 0);
	ci.setTarget(TargetInfo::CreateTargetInfo(ci.getDiagnostics(),
						  ci.getTargetOpts()));
	ci.getDiagnostics().setSuppressAllDiagnostics();

	ci.createFileManager();
	ci.createSourceManager(ci.getFileManager());
	ci.createPreprocessor();
	ci.createASTContext();

	llvm::MemoryBuffer *mb = llvm::MemoryBuffer::getMemBuffer(tunit, "macrodef.c");
	ci.getSourceManager().createMainFileIDForMemBuffer(mb);

	ConstantExprExtractor consumer;
	consumer.ctx = &ci.getASTContext();

	ParseAST(ci.getPreprocessor(), &consumer, ci.getASTContext());

	llvm::SmallVector<char, 128> tmp;
	// done, let's see what we got here
	APValue &v = consumer.er.Val;
	switch (v.getKind()) {
	case APValue::Int:
		v.getInt().toString(tmp);
		break;
	case APValue::Float:
		v.getFloat().toString(tmp);
		break;
	default:
		break;
	}

	if (!tmp.empty()) {
		llvm::StringRef nm = name.getIdentifierInfo()->getName();
		printf("const %.*s = %.*s;\n", nm.size(), nm.data(), tmp.size(), &tmp[0]);
	}
}



More information about the cfe-dev mailing list