[cfe-dev] Stmt.getEndLoc() vs semicolon

Yitzhak Mandelbaum via cfe-dev cfe-dev at lists.llvm.org
Thu Feb 27 07:08:00 PST 2020


Sorry, I overlooked your point about the trailing comments. Yes, there are
two issues: trailing comments and inclusion of semicolon -- my response was
with regards to the semicolon.  Since there are a lot issues here, I've
tried to break them down below.

## Including semicolons (aka Expr vs Stmt)
When an expression is used in a context expecting a statement and is
trailed by a semicolon, the range will not include the semicolon.  If there
are any comments between the expression and the semicolon those, too, will
be left out.  If we had an ExprStmt, then one could imagine the expression
range covering just the expression, but the statement range extending to
include the comment and semicolon.

## Including trailing comments
The trailing comments issue seems to be consistent across syntactic forms
(at least, the ones you've presented).  Add a trailing comment after a decl
statement, eg.
  int x; // foo
and the range still ends at the semicolon. Similarly,
  if (true) {} // foo
the range ends at '}'.

In general, associating comments is hard and it seems reasonable to me
either way -- including or excluding.  I'll admit that it feels arbitrary
that leading comments are included why trailing are excluded, but I'd
venture a guess that this is caused by the parser. Morever, since it seems
to be consistent, I think its ok, but should be documented clearly
somewhere.

# re: Is Expr the only Stmt that doesn’t include „it’s end“?
I'd say no, with respect to trailing comments, but yes with respect to
semicolons.  However, you could argue that it infects other statements as
well, e.g.
  if (...) foo();
The range of the _if statement_ will not include the semicolon. So, it too
does not include "it's end".

## unifyStmtRange()
I'm all for this kind of thing and, in fact, think it's really the only
good solution (that is, adding a layer atop the AST to compute things like
this). I'm happy to review patches. Or, if you wait long enough, I'll
probably get to this myself. :)

Cheers,
Yitzhak


On Wed, Feb 26, 2020 at 4:53 PM Alexander Lanin <alex at lanin.de> wrote:

> Hi,
>
>
>
> thanks, that’s interesting, but I’m not sure how this relates to trailing
> comments + semicolon?
>
> Why does it matter whether Expr is wrapped or not? It can contain comments
> in general. But doesn’t contain those at the end.
>
> At least for such examples I find it strange:
> http://ce.steveire.com/z/KdXBDg
>
>
>
>
>
> Is Expr the only Stmt that doesn’t include „it’s end“?
>
> Would it make sense to have some unifyStmtRange() in SourceCode.h with
> special handling for Expr which would extend it with (n x comment) +
> (optional comma or semi) [+ all the special cases I’m not thinking about
> right now]?
>
>
>
> Currently the check mentioned below always subtracts one char and then
> checks whether it’s a semicolon. Might be nicer to distinguish by isa<Expr>?
>
>
>
> Regards,
>
> Alex
>
>
>
>
>
>
>
> *Von: *Yitzhak Mandelbaum <yitzhakm at google.com>
> *Gesendet: *Mittwoch, 26. Februar 2020 16:40
> *An: *Alexander Lanin <alex at lanin.de>
> *Cc: *cfe-dev at lists.llvm.org
> *Betreff: *Re: [cfe-dev] Stmt.getEndLoc() vs semicolon
>
>
>
> Alexander,
>
>
>
> I agree that this behavior is rather confusing, especially to first-time
> users. The cause of the issue is that Expr derives from Stmt so that Exprs
> can appear directly in, for example, compound statements, rather than being
> wrapped in an explicit node to represent statements that consist only of an
> expression.  As far as I understand, this choice to avoid an explicit node
> had some (positive) performance implications for the AST when it was
> originally designed.  While I'd love to see this choice revisited, I think
> it would be a significant effort and don't expect it to happen.
>
>
>
> Instead, we tend to work around the issue.  For getting proper ranges of
> statements, you might find clang::tooling::getExtendedText/getExtendedRange
> meet your needs:
>
>
> https://github.com/llvm/llvm-project/blob/master/clang/include/clang/Tooling/Transformer/SourceCode.h#L56
>
>
>
> For declarations, the (just added!) getAssociatedRange might be even
> better:
> https://github.com/llvm/llvm-project/blob/master/clang/include/clang/Tooling/Transformer/SourceCode.h#L39
>
>
>
> Cheers,
>
> Yitzhak
>
>
>
> On Tue, Feb 25, 2020 at 5:15 PM Alexander Lanin via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
> Hello,
>
>
>
> I’m having trouble with the locations returned by Stmt
> getEndLoc()/getSourceRange() and am wondering whether this is a bug/flaw:
>
> the returned location sometimes includes and sometimes excludes the
> semicolon in the end.
>
> This gets even more interesting when there are (multiple) comments before
> the semicolon, simply because the difference between the returned values
> gets even bigger.
>
> Here is an example showing how DeclStmt includes the semicolon, while
> CallExpr and BinaryOperator exclude the semicolon
> http://ce.steveire.com/z/TW3IAG.
>
>
>
> So Stmt behaves “completely differently” depending on it’s type, which is
> no way suggested by it’s interface.
>
> Wouldn’t it be better for Stmt-Users if it would always be the same?
>
> (Not sure whether it should always be included or excluded. It’s not even
> always a semicolon as in that 3rd foo() in the example)
>
> https://en.wikipedia.org/wiki/Liskov_substitution_principle
>
>
>
> Here is one of the effects of this complexity for users of Stmt as they
> struggle to find that semicolon:
>
> https://bugs.llvm.org/show_bug.cgi?id=25970 /
> https://reviews.llvm.org/D16267
>
> Of course it’s fixable there, but that would imply working around the
> issue in multiple places.
>
>
>
> Regards,
>
> Alexander Lanin
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200227/065f8cd7/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3854 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200227/065f8cd7/attachment-0001.bin>


More information about the cfe-dev mailing list