[cfe-dev] Parser Design?

Tue Jun 2 06:26:12 PDT 2009

Hi all,

I'm new to the mailing list (been lurking a few weeks). I just compiled the
code and started looking at it.

At this point in life, I'm working with software analysis. In the past I've
written interpreters for Domain Specific Languages, done code analysis, and
written a lot of C and C++.

My desire it to help clang be able to provide for software analysis what
Parasoft's C++Test does, but to make a better scripting language than the
"symbolic" language the C++Test provides. Most importantly, clang's
licensing will allow anyone to make use of it without costing several body
parts. I'd love to be able to write out some language analysis rules that
would be the equivalent of:
   "flag any places where there is a missing (copy constructor | assignment
operator) when there is in any of (base class | contained class |
base-of-contained class) a non-POD data type."

  Once a set of predicates were written in such a scripting language, the
set could be evaluated against a stored form of the AST, and those parts of
software analysis which *can* be automated could then be put into the test
process. Even those parts which are harder and require a CFG could be aided
by such an approach, instead of hand-coding each query into a distinct
binary.

Question 1) Does this sound interesting? I'd be working on such features
only when I had no specific tasking at work to do - such a tool would aid my
work once it were mature, so I'm allowed to use my "down time" to work on
it. [And my family time is far too valuable. :) ]

I have a question about the parser design - it looks like it was written by
hand: was it?

For C, that's not so hard, as the C spec is pretty straightforward. But for
C++, a hand-written parser seems to me to be a bit more difficult -
especially with C++0x's changes.

Question 2) Is there any interest in using a lex/yacc type of approach?
  I've both written by-hand parsers and used parser-generators to create
front ends - and the latter makes for considerably less work when the
language is well-defined.

Question 3) How does clang know when it's being targeted at C vs C++? There
are some ares in which valid C is invalid C++, so the parser (if exact for
both languages) would either have to know the difference, would have to
switch, or perhaps is C++ being treated as a superset of C?

thanks,
Brian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20090602/ed53685d/attachment.html>