[PATCH v3 2/9] Add the initial TypoExpr AST node for delayed typo correction.

Fri Jul 25 15:29:21 PDT 2014

On Fri, Jul 25, 2014 at 3:21 PM, Kaelyn Takata <rikka at google.com> wrote:

> On Fri, Jul 25, 2014 at 1:57 PM, Richard Smith <richard at metafoo.co.uk>
> wrote:
>
>> On Mon, Jul 14, 2014 at 4:55 PM, Kaelyn Takata <rikka at google.com> wrote:
>>
>>> ---
>>>  include/clang/AST/DataRecursiveASTVisitor.h |  1 +
>>>  include/clang/AST/Expr.h                    | 20 ++++++++++++++++++++
>>>  include/clang/AST/RecursiveASTVisitor.h     |  1 +
>>>  include/clang/Basic/StmtNodes.td            |  1 +
>>>  include/clang/Sema/SemaInternal.h           |  3 +++
>>>  include/clang/Sema/TypoCorrection.h         |  5 +++++
>>>  lib/AST/Expr.cpp                            |  1 +
>>>  lib/AST/ExprClassification.cpp              |  3 ++-
>>>  lib/AST/ExprConstant.cpp                    |  1 +
>>>  lib/AST/ItaniumMangle.cpp                   |  1 +
>>>  lib/AST/StmtPrinter.cpp                     |  5 +++++
>>>  lib/AST/StmtProfile.cpp                     |  4 ++++
>>>  lib/Sema/SemaExceptionSpec.cpp              |  1 +
>>>  lib/Sema/SemaLookup.cpp                     | 16 ++++++++++++++++
>>>  lib/Sema/TreeTransform.h                    |  6 ++++++
>>>  lib/Serialization/ASTReaderStmt.cpp         |  4 ++++
>>>  lib/Serialization/ASTWriterStmt.cpp         |  6 ++++++
>>>  lib/StaticAnalyzer/Core/ExprEngine.cpp      |  1 +
>>>  tools/libclang/CXCursor.cpp                 |  1 +
>>>  19 files changed, 80 insertions(+), 1 deletion(-)
>>
>>
>> The AST library is supposed to be self-contained; it's a layering
>> violation to put TypoExpr member definitions into Sema. Conversely, it'd be
>> a layering violation to put them into AST, since they currently delete Sema
>> types. To avoid this, I suggest:
>>
>
> I'd have actually preferred to keep TypoExpr out of the AST library
> altogether since it is a synthetic node generated by Sema for use by Sema
> and really shouldn't ever be seen/touched outside of Sema, as it is simply
> a placeholder for Sema's typo correction. But of course defining AST nodes
> that don't live in the AST library to a certain degree isn't possible... ah
> well. I knew I'd have to deal with the layering issue at some point
>
> * The TypoExpr class should instead store a single opaque 'void*' pointer,
>> and make no attempt to deallocate or otherwise manage it.
>> * Sema owns the pointee and is responsible for deleting it (you can
>> delete it and zero out the pointer on the TypoExpr at the point when you
>> handle the TypoExpr)
>>
>> Alternatively, since you need Sema to track the live TypoExprs anyway,
>> you could have Sema maintain a map from TypoExpr* to whatever data you want
>> to store for the TypoExprs, and have the AST node be essentially stateless.
>>
>
> I think I'll go with this approach since a TypoExpr node is just a
> placeholder for internal use by Sema. For the mapping, do you think it
> would be better to move the recovery callback and diagnostic generator into
> the TypoCorrectionConsumer so that the map is simply from a TypoExpr to a
> TypoCorrectionConsumer, or to place the three unique_ptrs in a tuple that
> would go into the map, or to wrap the three unique_ptrs in a struct that is
> stored in the map?
>

Since the map will be internal to Sema, I think you should go with whatever
option makes the Sema implementation most convenient. Since we use
TypoCorrectionConsumer for the non-delayed case, keeping the other parts
separate seems likely to be best.

> Incidentally, I wonder whether it'd be useful for TypoExpr to store the
>> typoed identifier, maybe as an IdentifierInfo*? (This'd allow you to
>> produce something more useful from the statement printer and ast dumper,
>> and we really don't care much how big a TypoExpr is, since we'll --
>> presumably -- never have many of them.) This might also be useful for
>> clients that want to turn off typo correction but keep the TypoExprs in the
>> AST.
>>
>
> As the typo correction is currently structured, a TypoExpr would never be
> generated when typo correction is turned off since the TypoExpr is only for
> Sema to know when and where to go back to perform typo correction when it
> wasn't immediately performed. I'm also not sure how useful it would be to
> have meaningful output from the statement printer and AST dumper since the
> TypoExpr shouldn't be kept around that long (it should be gone by the time
> the full statement has finished being parsed and analysed--I cannot think
> of any cases where that wouldn't be true that wouldn't also be a bug; after
> all, until the TypoExpr is handled Sema won't know if the error it
> represents is recoverable or what the recovered AST should be). If/once
> there's a good use case for having the identifier-to-be-corrected stored in
> the TypoExpr, it'll be simple enough to add since it is available when the
> TypoExpr is created and is also already being stored in the
> TypoCorrectionConsumer.
>

OK, I'm happy keeping it empty for now. I asked because we have several
clang-as-a-library users who've asked for both
a) parse what you can, and build an approximate AST for what you can't, and
b) tell me where the typos are, but don't actually correct them
... and it seems that we could improve the situation in both cases if we
had a mode to create TypoExprs but not act on them. Let's keep that as a
separate discussion to be had once this work lands, though.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140725/40b9cc12/attachment.html>