[cfe-dev] Stmt.getEndLoc() vs semicolon

Yitzhak Mandelbaum via cfe-dev cfe-dev at lists.llvm.org
Wed Mar 11 10:46:54 PDT 2020


responses inline...

On Sat, Feb 29, 2020 at 3:42 PM <alex at lanin.de> wrote:

> Hi,
>
>
>
> >> When an expression is used in a context expecting a statement and is
> trailed by a semicolon, the range will not include the semicolon
>
> Assuming there is a reason for doing that, this explains the effects. It
> certainly sounds easier to extend Expr range instead of wrapping it in
> ExprStmt...
>

The problem is context. Consider expression `f(e)` in two cases:

1. `f(e);`
2. `return f(e) + 3;`

What is the range of `f(e)`? Is it wider in the first case than the second?
if so, will this surprise/break code that might assume it invariant?  But,
even worse, in case 1, how do you know which range the user wants? If I'm
replacing statements, then I'll want through the semicolon, but if I'm
updating call expressions, I'll only want the call itself.

So, both context and intent matter. An ExprStmt would clearly separate the
two.

>
>
> >> Expr vs Stmt
>
> Turns out DoStmt is also fun as the conditions is an Expr.
>
> do {} while(false)
>
> /* Comment is not part of DoStmt,
>
> although it's before the semicolon*/;
>
Good catch! But, I don't think that's the reason, because the condition is
clearly delineated by the parentheses. It just seems to be an
inconsistency. I can't see any reason that the comment and semi are
excluded here while included in the DeclStmt. I think this should be
changed.

> >> In general, associating comments is hard and it seems reasonable to me
> either way -- including or excluding.
>
> Sure, as long as it’s consistent. The semicolon (and preceding comments)
> are not.
>
>
>
> >> it infects other statements as well
>
> Yeah. I realized that too late…
>
> I’ve added a makeUnifiedFileCharRange function to LexerUtils which
> basically simply calls findTokenSkippingComments + some special handling
> for NullStmt.
>
> I’ve tried token by token, but I cannot get this to work without a “full
> blown” parser as there is a very close match to consider:
>
> - In the first case “end while” obviously belongs to the while statement
> and there is no semicolon to search for. I have to take tok::r_brace as an
> end. It’s also impossible to go for isa<WhileStmt> since it is nested.
>
> - In the second case “assign” obviously belongs to the statement. Here I
> have to look beyond tok::r_brace and not stop there.
>
>
>
> Any advice?
>
Have you looked at the code I pointed to earlier (getAssociatedRange)?
Properly handling those kinds of cases is quite difficult, but I don't
think you need to parse the code. It's more about newlines and such.




>
>
>
>
> // RUN: %check_clang_tidy %s readability-braces-around-statements %t
>
>
>
> int foo() { return 42; }
>
>
>
> void test() {
>
>   if (true)
>
>     if (true)
>
>       while ("ok") {
>
>         foo();
>
>       } // end while
>
>
>
>   int s;
>
>   if (true)
>
>     if (true)
>
>       s = int{
>
>         foo()
>
>       }; // assign
>
> }
>
>
>
>
>
> Best regards,
>
> Alexander
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200311/11db0c42/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3854 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200311/11db0c42/attachment-0001.bin>


More information about the cfe-dev mailing list