[cfe-dev] Extend Stmt with proper end location?

Thu Mar 26 11:28:13 PDT 2020

On Thu, Mar 26, 2020 at 7:02 PM <alex at lanin.de> wrote:

> Hi,
>
>
>
> If I understand the concept correctly, currently many checkers rely on
> parsing a few relevant tokens themselves.
>
> Potentially with the help of some utils/helpers that simplify the checker
> like the one I’m trying to introduce (see other mail).
>
>
>
> With TokenBuffer/SyntaxTree, parsing is no longer needed for some/most
> checkers.
>
> TokenBuffer/SyntaxTree would abstract “one step further” then e.g.
> LexerUtils and provide an enhanced/specialized AST.
>
>
>
> However it’s not used that widely throughout llvm.
>
> Is it new and the way to go?
>
TokenBuffer is pretty robust, we're using it heavily in clangd. It may have
some incomplete parts. This is one step up from LexerUtils.
Syntax tree is new and ... not finished yet :-) This is a second step up.

> Do you expect clang-tidy checkers to rely on the SyntaxTree in the future?
>
I don't know, if it gets nicely finished I think it'd be very useful for
rewriting code (i.e. checker fixes). But I don't know anyone actively
working on it.

> It does indeed provide e.g. a BreakStatement which does exactly what I did
> by introducing a method into LexerUtils.
>
>
>
> Alex
>
>
>
>
>
> *Von:* Sam McCall <sammccall at google.com>
> *Gesendet:* Donnerstag, 26. März 2020 01:08
> *An:* alex at lanin.de
> *Cc:* Clang Dev <cfe-dev at lists.llvm.org>; John McCall <rjmccall at apple.com>
> *Betreff:* Re: [cfe-dev] Extend Stmt with proper end location?
>
>
>
>
>
>
>
> On Tue, Mar 17, 2020 at 6:11 AM John McCall via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
> Furthermore, I think expressions are important to consider,
>
> because the practical limitations on finding the semicolon after
> an expression are exactly the same as finding it after break.
>
> To spell this out a little more: formally in `foo();` there's an
> expression-statement which consists of a the call expression and the
> semicolon. But clang just uses the CallExpr node to represent both, and
> CallExpr obviously(?) shouldn't include the semicolon in its source range.
>
>
>
> Alex: I think the Syntax library might be more suitable for tasks that
> need this precise info such as refactoring (and it doesn't suffer in the
> same way from the multiple masters problem). Unfortunately it's not
> complete.
>
>
>
> The clang::syntax::TokenBuffer class allows you to capture the expanded
> token stream (bounds and kind of every token) as the parse runs (using
> TokenCollector). Effectively this lets you opt into making clang record
> more token-level info at the cost of memory. You then have to poke at this
> token stream yourself to find the semicolons you're after.
>
>
>
> The rest of the Syntax library ("syntax trees") uses a clang AST to build
> up a true syntactic (grammar-based) tree out of these tokens.
> "TEST_F(SyntaxTreeTest, While)" in TreeTest.cpp shows how this includes the
> semicolons of the (grammatical) BreakStatement.
>
> The plan is/was to make it easy to then map between semantic and syntactic
> nodes, e.g. AST BreakStmt to the corresponding syntax BreakStatement. This
> hasn't been implemented yet I think.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200326/515bc5bb/attachment-0001.html>