[cfe-commits] Patch for review: Parenthesis range in CXXConstructExprs

Wed Jul 11 00:06:53 PDT 2012

Il 11/07/2012 07:23, Richard Smith ha scritto:
> On Tue, Jul 10, 2012 at 10:05 PM, Daniel Jasper <djasper at google.com
> <mailto:djasper at google.com>> wrote:
> 
>     On Wed, Jul 11, 2012 at 12:33 AM, Richard Smith
>     <richard at metafoo.co.uk <mailto:richard at metafoo.co.uk>> wrote:
> 
>         On Tue, Jul 10, 2012 at 3:07 PM, Daniel Jasper
>         <djasper at google.com <mailto:djasper at google.com>> wrote:
> 
>             I don't think this is fully what I am trying to fix with
>             this patch. This patch addresses the inconsistency that the
>             CXXConstructExpr contains the correct ParenRange for Stmts like:
> 
>             X value(7);
> 
>             But an incorrect range if the constructor is called like a
>             function, e.g.:
> 
>             X(7);
> 
> 
>         Right. My point is we don't have rules for what a correct range
>         would be, and we need such rules before we can say whether your
>         patch is right. My proposed rule means that the source range is
>         wrong in both cases, and shouldn't include the parens (and that
>         we should drop the ParenRange from CXXConstructExpr entirely).
>         The fix for that is completely different from what you're
>         proposing :-)
> 
>             As for printing, I don't know whether the SourceRange of the
>             CXXConstructExpr should include the parentheses. However,
>             for refactoring, it is definitely highly beneficial to know
>             it (and in fact I think we'll have to go and add it to other
>             nodes as well). So, as CXXConstructExpr already contains a
>             member for that, we should definitely populate it as best we
>             can.
> 
> 
>         That presumes that the parens are somehow logically part of the
>         CXXConstructExpr, which I think is really the relevant question
>         here. CXXConstructExpr is used in various cases where the parens
>         are either part of some other construct, or where there are no
>         parens, so I don't think it makes much sense to include the
>         parens in the source range. I would imagine that for
>         refactoring, what's really desired is a consistent and rational
>         set of rules for what the source range of an expression means,
>         which, as my examples demonstrate, we're *really really far*
>         from having.
> 
>         Consider these two cases:
> 
>         X value(7);
>         int value(7);
> 
>         In the first case, the source range for the initializer covers
>         'value' and the parentheses. For the second case, it covers only
>         the '7'. The right fix for the second case seems naturally to be
>         that we should store the source locations of the parentheses on
>         the VarDecl. And that would remove any need for storing them on
>         the CXXConstructExpr in the first case.
> 
>         Likewise in these expressions:
> 
>         X(7)
>         int(7)
> 
>         Here, the CXXFunctionalCastExpr already contains the source
>         locations of the parentheses. In the 'int' case, we don't store
>         them anywhere else (and nor do we need to). In the 'X' case, we
>         include the 'X' and the '(' in the source range of the
>         CXXConstructExpr. That is inconsistent and unnecessary.
> 
> 
>     Why do we not need the parenthesis for int(7)? Because there are no
>     diagnostics that can trigger on a range including them?
> 
> 
> Because they can be derived from the CXXFunctionalCastExpr (also,
> there's no other AST node where they could reasonably go).
>  
> 
>     I agree with everything you say, there should be a consistent way to
>     determine source locations. However, we need pretty much all the
>     source locations (this includes locations for each c/v qualifier,
>     all parenthesis, possibly braces, ...) in order to provide proper
>     refactoring tools. I gather that some of the inconsistencies have
>     occurred because we are trying to save as much space as possible and
>     thus only add locations that are really necessary for diagnostics.
>     So, I see two challenges:
>     1) Make the current state more consistent. For this we might need to
>     set up some basic rules and then try to fix what can be fixed.
>     2) Provide a way to access the currently unavailable source
>     locations. For this, is see two approaches:
>       - Include way more source locations in the current AST, possibly
>     guarded by a flag to preserve performance if not needed.
>       - Provide means to re-parse the required part of the source code.
> 
>     Thoughts?
> 
> 
> I completely agree.
> 
> I think the biggest problem for (2) is that the AST assumes that the
> location of the next token after a SourceLocation can be determined, and
> currently there's no easy way to do that. I think that it's possible to
> extract that information by poking through the SLocEntries in the
> SourceManager, without storing any extra data and with only minimal
> re-lexing (and no macro expansion), but if so, we should package that
> functionality up and make it generally available.

Some times ago we discussed that with Argyrios and others making me to
discover that there was not enough info to reconstruct token sequence
while taking in proper account macro expansion.

A smart cheap way I've found to do that is to build when lexing a
SourceLoc "hop" table to be used in the following way:

- if current SourceLoc is in the hop table then the next token SourceLoc
is extracted from table

- otherwise the next token SourceLoc is obtained from raw lexing

This approach has the benefit to need hop table entries only when
preprocessor changes lexed token source, reducing the space needed.

-- 
Abramo Bagnara

Opera Unica                          Phone: +39.0546.656023
Via Borghesi, 16
48014 Castel Bolognese (RA) - Italy