[cfe-dev] Decoupling semantics from parsing

John McCall via cfe-dev cfe-dev at lists.llvm.org
Mon Apr 1 12:24:09 PDT 2019


On 1 Apr 2019, at 14:56, Reid Kleckner via cfe-dev wrote:
> Earlier in Clang's life, the parser did not depend on semantic analysis
> (lib/Parse did not depend on lib/Sema). However, my understanding is that
> as C++ support was added, it became clear that this was awkward, so in
> r112244, John removed the virtual 'Action' interface that Sema implemented
> and made Parse depend directly on Sema. I wasn't around at the time, so I
> don't know the exact motivations, but from what I can tell, clang has
> intentionally moved away from the kind of model you are proposing.

There are two somewhat-separable subjects here.

The first is doing parsing without doing semantic analysis.  C is formally a
context-sensitive grammar, but it is possible to parse a C token sequence
into an ambiguous syntax tree (which would simply contain both valid parses
of e.g. `size_t *x;` in statement context) without semantic information.
Clang has never been written to do this; the abstraction layer we used to
have between Parser and Sema still had queries like "does this name resolve
to a type" which had to be answered before parsing could continue.  Building
ambiguous parse trees can be useful for source tools but creates a lot of
complexity for a compiler, which has always been Clang's primary mission.

The second is how information is exchanged between the parser and
semantic analysis.  Clang's parser used to call its semantic analysis
layer through an abstracted interface, but we never had a useful
alternative implementation, and the sheer breadth of the interactions
required for C++ (just because there are so many new grammatical
productions) made the interface increasingly unwieldy (and hard to
imagine providing an alternative implementation of), so we killed it off.

Also, in C there's a massive performance optimization available if you can
combine the lookup performed by the lexer (to check whether something is a
macro and/or a keyword) with the identifier lookups performed by the parser
(in these ambiguous-parse cases) and semantic analysis (for actual name
resolution).  We wouldn't want an abstraction layer to interfere with that
optimization.  (Unfortunately, this optimization loses a lot of its
effectiveness in C++ because there's so much non-lexical lookup of
unqualified names.)

John.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20190401/e539764a/attachment.html>


More information about the cfe-dev mailing list