<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><meta name=Generator content="Microsoft Word 15 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
code
{mso-style-priority:99;
font-family:"Courier New";}
span.E-MailFormatvorlage20
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;
mso-fareast-language:EN-US;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:70.85pt 70.85pt 2.0cm 70.85pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=DE link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='mso-fareast-language:EN-US'>Hi,<o:p></o:p></span></p><p class=MsoNormal><span style='mso-fareast-language:EN-US'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US style='mso-fareast-language:EN-US'>If I understand the concept correctly, currently many checkers rely on parsing a few relevant tokens themselves.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='mso-fareast-language:EN-US'>Potentially with the help of some utils/helpers that simplify the checker like the one I’m trying to introduce (see other mail).<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='mso-fareast-language:EN-US'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US style='mso-fareast-language:EN-US'>With TokenBuffer/SyntaxTree, parsing is no longer needed for some/most checkers.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='mso-fareast-language:EN-US'>TokenBuffer/SyntaxTree would abstract “one step further” then e.g. LexerUtils and provide an enhanced/specialized AST.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='mso-fareast-language:EN-US'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US style='mso-fareast-language:EN-US'>However it’s not used that widely throughout llvm.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='mso-fareast-language:EN-US'>Is it new and the way to go? Do you expect clang-tidy checkers to rely on the SyntaxTree in the future?<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='mso-fareast-language:EN-US'>It does indeed provide e.g. a BreakStatement which does exactly what I did by introducing a method into LexerUtils.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='mso-fareast-language:EN-US'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US style='mso-fareast-language:EN-US'>Alex<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='mso-fareast-language:EN-US'><o:p> </o:p></span></p><p class=MsoNormal><span lang=EN-US style='mso-fareast-language:EN-US'><o:p> </o:p></span></p><p class=MsoNormal><b>Von:</b> Sam McCall <sammccall@google.com> <br><b>Gesendet:</b> Donnerstag, 26. März 2020 01:08<br><b>An:</b> alex@lanin.de<br><b>Cc:</b> Clang Dev <cfe-dev@lists.llvm.org>; John McCall <rjmccall@apple.com><br><b>Betreff:</b> Re: [cfe-dev] Extend Stmt with proper end location?<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><div><div><p class=MsoNormal><o:p> </o:p></p></div><p class=MsoNormal><o:p> </o:p></p><div><div><p class=MsoNormal>On Tue, Mar 17, 2020 at 6:11 AM John McCall via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a>> wrote:<o:p></o:p></p></div><blockquote style='border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-right:0cm'><div><div><div><p><span style='font-family:"Arial",sans-serif'>Furthermore, I think expressions are important to consider,<o:p></o:p></span></p></div><div><p><span style='font-family:"Arial",sans-serif'>because the practical limitations on finding the semicolon after<br>an expression are exactly the same as finding it after </span><code><span style='font-size:10.0pt;color:black;background:#F7F7F7'>break</span></code><span style='font-family:"Arial",sans-serif'>.<o:p></o:p></span></p></div></div></div></blockquote><div><p class=MsoNormal>To spell this out a little more: formally in `foo();` there's an expression-statement which consists of a the call expression and the semicolon. But clang just uses the CallExpr node to represent both, and CallExpr obviously(?) shouldn't include the semicolon in its source range.<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>Alex: I think the Syntax library might be more suitable for tasks that need this precise info such as refactoring (and it doesn't suffer in the same way from the multiple masters problem). Unfortunately it's not complete.<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>The clang::syntax::TokenBuffer class allows you to capture the expanded token stream (bounds and kind of every token) as the parse runs (using TokenCollector). Effectively this lets you opt into making clang record more token-level info at the cost of memory. You then have to poke at this token stream yourself to find the semicolons you're after.<o:p></o:p></p></div><div><p class=MsoNormal><o:p> </o:p></p></div><div><p class=MsoNormal>The rest of the Syntax library ("syntax trees") uses a clang AST to build up a true syntactic (grammar-based) tree out of these tokens. "TEST_F(SyntaxTreeTest, While)" in TreeTest.cpp shows how this includes the semicolons of the (grammatical) BreakStatement.<o:p></o:p></p></div><div><p class=MsoNormal>The plan is/was to make it easy to then map between semantic and syntactic nodes, e.g. AST BreakStmt to the corresponding syntax BreakStatement. This hasn't been implemented yet I think.<o:p></o:p></p></div></div></div></div></body></html>