[cfe-dev] Stmt.getEndLoc() vs semicolon

via cfe-dev cfe-dev at lists.llvm.org
Wed Mar 11 14:35:40 PDT 2020


Ok got it!

I thought all Expressions look like that missing ExprStmt.

There are Statements and Expressions and I initially assumed Expressions look like Statements, but that’s just one sub/supertype of Expr.

 

 

Looking into the DoWhile topic I found a corresponding commit:

 <https://github.com/llvm/llvm-project/commit/503fadc90f397a773ac17f40f72ca3e81cba2c49> https://github.com/llvm/llvm-project/commit/503fadc90f397a773ac17f40f72ca3e81cba2c49

 

Interestingly the statements touched in that commit have some high overlap with my “strange behavior” list in  <https://reviews.llvm.org/D75813#change-Bc7Hs8CXH7Z9> https://reviews.llvm.org/D75813#change-Bc7Hs8CXH7Z9

I came up with my list by comparing results of an AST based approach and a token based approach - over llvm codebase.

Apparently there are no gotos in the part which I have looked at ;-)

 

I think I could change ParseStmt.cpp for DoWhile, GotoStmt, ContinueStmt, BreakStmt, ReturnStmt, AsmStmt and SEHLeaveStmt.

Looks rather trivial (so it probably isn’t).

But I would have to change Stmt.h as well and add an SemiLoc or something to all of them.

For example BreakStmt doesn’t have any secondary SourceLocation, others already store RParenLoc – but that doesn’t help.

 

Does that make sense? I would still need to keep my getUnifiedEndLoc, but it could be simplified to getStmtLikeEndLocForStmtLikeExpr.

 

Alex

 

 

 

Von: Yitzhak Mandelbaum <yitzhakm at google.com> 
Gesendet: Mittwoch, 11. März 2020 18:47
An: Alexander Lanin <alex at lanin.de>
Cc: Clang Dev <cfe-dev at lists.llvm.org>
Betreff: Re: [cfe-dev] Stmt.getEndLoc() vs semicolon

 

responses inline...

 

On Sat, Feb 29, 2020 at 3:42 PM <alex at lanin.de <mailto:alex at lanin.de> > wrote:

Hi,

 

>> When an expression is used in a context expecting a statement and is trailed by a semicolon, the range will not include the semicolon

Assuming there is a reason for doing that, this explains the effects. It certainly sounds easier to extend Expr range instead of wrapping it in ExprStmt...

 

The problem is context. Consider expression `f(e)` in two cases:

 

1. `f(e);`

2. `return f(e) + 3;`

 

What is the range of `f(e)`? Is it wider in the first case than the second? if so, will this surprise/break code that might assume it invariant?  But, even worse, in case 1, how do you know which range the user wants? If I'm replacing statements, then I'll want through the semicolon, but if I'm updating call expressions, I'll only want the call itself.

 

So, both context and intent matter. An ExprStmt would clearly separate the two.

 

>> Expr vs Stmt

Turns out DoStmt is also fun as the conditions is an Expr.

do {} while(false)

/* Comment is not part of DoStmt,

although it's before the semicolon*/;

Good catch! But, I don't think that's the reason, because the condition is clearly delineated by the parentheses. It just seems to be an inconsistency. I can't see any reason that the comment and semi are excluded here while included in the DeclStmt. I think this should be changed.

>> In general, associating comments is hard and it seems reasonable to me either way -- including or excluding.

Sure, as long as it’s consistent. The semicolon (and preceding comments) are not.

 

>> it infects other statements as well

Yeah. I realized that too late…

I’ve added a makeUnifiedFileCharRange function to LexerUtils which basically simply calls findTokenSkippingComments + some special handling for NullStmt.

I’ve tried token by token, but I cannot get this to work without a “full blown” parser as there is a very close match to consider:

- In the first case “end while” obviously belongs to the while statement and there is no semicolon to search for. I have to take tok::r_brace as an end. It’s also impossible to go for isa<WhileStmt> since it is nested.

- In the second case “assign” obviously belongs to the statement. Here I have to look beyond tok::r_brace and not stop there.

 

Any advice?

Have you looked at the code I pointed to earlier (getAssociatedRange)? Properly handling those kinds of cases is quite difficult, but I don't think you need to parse the code. It's more about newlines and such.

 

 

 

 

 

// RUN: %check_clang_tidy %s readability-braces-around-statements %t

 

int foo() { return 42; }

 

void test() {

  if (true)

    if (true)

      while ("ok") {

        foo();

      } // end while

 

  int s;

  if (true)

    if (true)

      s = int{

        foo()

      }; // assign

}

 

 

Best regards,

Alexander

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200311/a598abcb/attachment-0001.html>


More information about the cfe-dev mailing list