[cfe-dev] Extend Stmt with proper end location?

Sam McCall via cfe-dev cfe-dev at lists.llvm.org
Wed Mar 25 17:07:42 PDT 2020


On Tue, Mar 17, 2020 at 6:11 AM John McCall via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Furthermore, I think expressions are important to consider,
>
> because the practical limitations on finding the semicolon after
> an expression are exactly the same as finding it after break.
>
To spell this out a little more: formally in `foo();` there's an
expression-statement which consists of a the call expression and the
semicolon. But clang just uses the CallExpr node to represent both, and
CallExpr obviously(?) shouldn't include the semicolon in its source range.

Alex: I think the Syntax library might be more suitable for tasks that need
this precise info such as refactoring (and it doesn't suffer in the same
way from the multiple masters problem). Unfortunately it's not complete.

The clang::syntax::TokenBuffer class allows you to capture the expanded
token stream (bounds and kind of every token) as the parse runs (using
TokenCollector). Effectively this lets you opt into making clang record
more token-level info at the cost of memory. You then have to poke at this
token stream yourself to find the semicolons you're after.

The rest of the Syntax library ("syntax trees") uses a clang AST to build
up a true syntactic (grammar-based) tree out of these tokens.
"TEST_F(SyntaxTreeTest, While)" in TreeTest.cpp shows how this includes the
semicolons of the (grammatical) BreakStatement.
The plan is/was to make it easy to then map between semantic and syntactic
nodes, e.g. AST BreakStmt to the corresponding syntax BreakStatement. This
hasn't been implemented yet I think.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200326/c96e51d3/attachment.html>


More information about the cfe-dev mailing list