[cfe-dev] Parser Design?

Sebastian Redl sebastian.redl at getdesigned.at
Tue Jun 2 07:31:14 PDT 2009


Brian Allison wrote:
> My desire it to help clang be able to provide for software analysis
> what Parasoft's C++Test does, but to make a better scripting language
> than the "symbolic" language the C++Test provides.
>
> Question 1) Does this sound interesting?
Yes. You're not the first one to come here with an interest in static
analysis - but you seem to have thought the most about it.
> I have a question about the parser design - it looks like it was
> written by hand: was it?
Yes.
>
> For C, that's not so hard, as the C spec is pretty straightforward.
> But for C++, a hand-written parser seems to me to be a bit more
> difficult - especially with C++0x's changes.
Ever tried writing a description of C++'s grammar? It's just about
impossible. A hand-written parser is not only the most performant and
most flexible way to go about it, it's also probably the easiest. Not
that writing a C++ parser is easy under any circumstances. Look at the
test cases in test/Parser/cxx-ambig-paren-expr.cpp for some of the stuff
that we have to cope with.
>
> Question 2) Is there any interest in using a lex/yacc type of approach?
No, I feel pretty confident in saying that there is no interest in
replacing our current lexer and/or parser. (Of course I can't speak for
everyone.) Using a hand-written recursive descent parser was a very
early decision in the project, and since I've been with it, I've seen
nothing to counter the view that it was absolutely the right thing to do.
For reference, the GCC team once had a generated parser for C++, but
they replaced it by a hand-written one - because the old one was not
only too slow, but also too inflexible.
>   I've both written by-hand parsers and used parser-generators to
> create front ends - and the latter makes for considerably less work
> when the language is well-defined.
Neither C nor C++ are well-defined. They are extremely context-sensitive
- *especially* C++ - and that's not a good thing for automatic parser
generation.
>
> Question 3) How does clang know when it's being targeted at C vs C++?
> There are some ares in which valid C is invalid C++, so the parser (if
> exact for both languages) would either have to know the difference,
> would have to switch, or perhaps is C++ being treated as a superset of C?
We have a class LangOptions; an instance of it is filled by the driver.
All the "master" objects (Lexer, Preprocessor, Parser, Sema and
ASTContext) hold a reference to this object. You can query this object
for a lot of flags: C++, Objective-C (if both are enabled, you get
Objective-C++), C++0x, C99, Objective-C2 and several extension flags.

The lexer, parser and sema components often make decisions based on the
state of these flags. If you grep the source for CPlusPlus you'll find
all the places where C++ makes a difference. For example, look around
line 340 of lib/Parse/ParseExpr.cpp for an example where the C and C++
grammars differ very subtly.

Sebastian



More information about the cfe-dev mailing list