[cfe-dev] Stmt.getEndLoc() vs semicolon

Thu Mar 12 06:40:52 PDT 2020

I don't have any say in the AST code.  That would have to involve one of
the owners (and I don't know who those are) and it's not a simple matter,
because so much of the compiler is connected to the AST. If you're thinking
of such a change I would start a new thread on this list explicitly raising
the issue.

Regarding that change -- I'm not sure how relevant it is, given that it's
from back in 2006.  But, it is interesting to see and probably indicates
there's a good reason for the discrepancy (or, at least, there was at some
point).

On Wed, Mar 11, 2020 at 5:35 PM <alex at lanin.de> wrote:

> Ok got it!
>
> I thought all Expressions look like that missing ExprStmt.
>
> There are Statements and Expressions and I initially assumed Expressions
> look like Statements, but that’s just one sub/supertype of Expr.
>
>
>
>
>
> Looking into the DoWhile topic I found a corresponding commit:
>
>
> https://github.com/llvm/llvm-project/commit/503fadc90f397a773ac17f40f72ca3e81cba2c49
>
>
>
> Interestingly the statements touched in that commit have some high overlap
> with my “strange behavior” list in
> https://reviews.llvm.org/D75813#change-Bc7Hs8CXH7Z9
>
> I came up with my list by comparing results of an AST based approach and a
> token based approach - over llvm codebase.
>
> Apparently there are no gotos in the part which I have looked at ;-)
>
>
>
> I think I could change ParseStmt.cpp for DoWhile, GotoStmt, ContinueStmt,
> BreakStmt, ReturnStmt, AsmStmt and SEHLeaveStmt.
>
> Looks rather trivial (so it probably isn’t).
>
> But I would have to change Stmt.h as well and add an SemiLoc or something
> to all of them.
>
> For example BreakStmt doesn’t have any secondary SourceLocation, others
> already store RParenLoc – but that doesn’t help.
>
>
>
> Does that make sense? I would still need to keep my getUnifiedEndLoc, but
> it could be simplified to getStmtLikeEndLocForStmtLikeExpr.
>
>
>
> Alex
>
>
>
>
>
>
>
> *Von:* Yitzhak Mandelbaum <yitzhakm at google.com>
> *Gesendet:* Mittwoch, 11. März 2020 18:47
> *An:* Alexander Lanin <alex at lanin.de>
> *Cc:* Clang Dev <cfe-dev at lists.llvm.org>
> *Betreff:* Re: [cfe-dev] Stmt.getEndLoc() vs semicolon
>
>
>
> responses inline...
>
>
>
> On Sat, Feb 29, 2020 at 3:42 PM <alex at lanin.de> wrote:
>
> Hi,
>
>
>
> >> When an expression is used in a context expecting a statement and is
> trailed by a semicolon, the range will not include the semicolon
>
> Assuming there is a reason for doing that, this explains the effects. It
> certainly sounds easier to extend Expr range instead of wrapping it in
> ExprStmt...
>
>
>
> The problem is context. Consider expression `f(e)` in two cases:
>
>
>
> 1. `f(e);`
>
> 2. `return f(e) + 3;`
>
>
>
> What is the range of `f(e)`? Is it wider in the first case than the
> second? if so, will this surprise/break code that might assume it
> invariant?  But, even worse, in case 1, how do you know which range the
> user wants? If I'm replacing statements, then I'll want through the
> semicolon, but if I'm updating call expressions, I'll only want the call
> itself.
>
>
>
> So, both context and intent matter. An ExprStmt would clearly separate the
> two.
>
>
>
> >> Expr vs Stmt
>
> Turns out DoStmt is also fun as the conditions is an Expr.
>
> do {} while(false)
>
> /* Comment is not part of DoStmt,
>
> although it's before the semicolon*/;
>
> Good catch! But, I don't think that's the reason, because the condition
> is clearly delineated by the parentheses. It just seems to be an
> inconsistency. I can't see any reason that the comment and semi are
> excluded here while included in the DeclStmt. I think this should be
> changed.
>
> >> In general, associating comments is hard and it seems reasonable to me
> either way -- including or excluding.
>
> Sure, as long as it’s consistent. The semicolon (and preceding comments)
> are not.
>
>
>
> >> it infects other statements as well
>
> Yeah. I realized that too late…
>
> I’ve added a makeUnifiedFileCharRange function to LexerUtils which
> basically simply calls findTokenSkippingComments + some special handling
> for NullStmt.
>
> I’ve tried token by token, but I cannot get this to work without a “full
> blown” parser as there is a very close match to consider:
>
> - In the first case “end while” obviously belongs to the while statement
> and there is no semicolon to search for. I have to take tok::r_brace as an
> end. It’s also impossible to go for isa<WhileStmt> since it is nested.
>
> - In the second case “assign” obviously belongs to the statement. Here I
> have to look beyond tok::r_brace and not stop there.
>
>
>
> Any advice?
>
> Have you looked at the code I pointed to earlier (getAssociatedRange)?
> Properly handling those kinds of cases is quite difficult, but I don't
> think you need to parse the code. It's more about newlines and such.
>
>
>
>
>
>
>
>
>
>
>
> // RUN: %check_clang_tidy %s readability-braces-around-statements %t
>
>
>
> int foo() { return 42; }
>
>
>
> void test() {
>
>   if (true)
>
>     if (true)
>
>       while ("ok") {
>
>         foo();
>
>       } // end while
>
>
>
>   int s;
>
>   if (true)
>
>     if (true)
>
>       s = int{
>
>         foo()
>
>       }; // assign
>
> }
>
>
>
>
>
> Best regards,
>
> Alexander
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200312/bd1093d6/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3854 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200312/bd1093d6/attachment-0001.bin>