[cfe-dev] Extend Stmt with proper end location?

Thu Mar 26 11:02:40 PDT 2020

Hi,

If I understand the concept correctly, currently many checkers rely on parsing a few relevant tokens themselves.

Potentially with the help of some utils/helpers that simplify the checker like the one I’m trying to introduce (see other mail).

With TokenBuffer/SyntaxTree, parsing is no longer needed for some/most checkers.

TokenBuffer/SyntaxTree would abstract “one step further” then e.g. LexerUtils and provide an enhanced/specialized AST.

However it’s not used that widely throughout llvm.

Is it new and the way to go? Do you expect clang-tidy checkers to rely on the SyntaxTree in the future?

It does indeed provide e.g. a BreakStatement which does exactly what I did by introducing a method into LexerUtils.

Alex

Von: Sam McCall <sammccall at google.com> 
Gesendet: Donnerstag, 26. März 2020 01:08
An: alex at lanin.de
Cc: Clang Dev <cfe-dev at lists.llvm.org>; John McCall <rjmccall at apple.com>
Betreff: Re: [cfe-dev] Extend Stmt with proper end location?

On Tue, Mar 17, 2020 at 6:11 AM John McCall via cfe-dev <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org> > wrote:

Furthermore, I think expressions are important to consider,

because the practical limitations on finding the semicolon after
an expression are exactly the same as finding it after break.

To spell this out a little more: formally in `foo();` there's an expression-statement which consists of a the call expression and the semicolon. But clang just uses the CallExpr node to represent both, and CallExpr obviously(?) shouldn't include the semicolon in its source range.

Alex: I think the Syntax library might be more suitable for tasks that need this precise info such as refactoring (and it doesn't suffer in the same way from the multiple masters problem). Unfortunately it's not complete.

The clang::syntax::TokenBuffer class allows you to capture the expanded token stream (bounds and kind of every token) as the parse runs (using TokenCollector). Effectively this lets you opt into making clang record more token-level info at the cost of memory. You then have to poke at this token stream yourself to find the semicolons you're after.

The rest of the Syntax library ("syntax trees") uses a clang AST to build up a true syntactic (grammar-based) tree out of these tokens. "TEST_F(SyntaxTreeTest, While)" in TreeTest.cpp shows how this includes the semicolons of the (grammatical) BreakStatement.

The plan is/was to make it easy to then map between semantic and syntactic nodes, e.g. AST BreakStmt to the corresponding syntax BreakStatement. This hasn't been implemented yet I think.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200326/bce4c3e3/attachment.html>