[PATCH v3 2/9] Add the initial TypoExpr AST node for delayed typo correction.

Kaelyn Takata rikka at google.com
Fri Jul 25 16:39:19 PDT 2014


On Fri, Jul 25, 2014 at 3:29 PM, Richard Smith <richard at metafoo.co.uk>
wrote:

> On Fri, Jul 25, 2014 at 3:21 PM, Kaelyn Takata <rikka at google.com> wrote:
>
>> On Fri, Jul 25, 2014 at 1:57 PM, Richard Smith <richard at metafoo.co.uk>
>> wrote:
>>
>>> On Mon, Jul 14, 2014 at 4:55 PM, Kaelyn Takata <rikka at google.com> wrote:
>>>
>>>> ---
>>>>  include/clang/AST/DataRecursiveASTVisitor.h |  1 +
>>>>  include/clang/AST/Expr.h                    | 20 ++++++++++++++++++++
>>>>  include/clang/AST/RecursiveASTVisitor.h     |  1 +
>>>>  include/clang/Basic/StmtNodes.td            |  1 +
>>>>  include/clang/Sema/SemaInternal.h           |  3 +++
>>>>  include/clang/Sema/TypoCorrection.h         |  5 +++++
>>>>  lib/AST/Expr.cpp                            |  1 +
>>>>  lib/AST/ExprClassification.cpp              |  3 ++-
>>>>  lib/AST/ExprConstant.cpp                    |  1 +
>>>>  lib/AST/ItaniumMangle.cpp                   |  1 +
>>>>  lib/AST/StmtPrinter.cpp                     |  5 +++++
>>>>  lib/AST/StmtProfile.cpp                     |  4 ++++
>>>>  lib/Sema/SemaExceptionSpec.cpp              |  1 +
>>>>  lib/Sema/SemaLookup.cpp                     | 16 ++++++++++++++++
>>>>  lib/Sema/TreeTransform.h                    |  6 ++++++
>>>>  lib/Serialization/ASTReaderStmt.cpp         |  4 ++++
>>>>  lib/Serialization/ASTWriterStmt.cpp         |  6 ++++++
>>>>  lib/StaticAnalyzer/Core/ExprEngine.cpp      |  1 +
>>>>  tools/libclang/CXCursor.cpp                 |  1 +
>>>>  19 files changed, 80 insertions(+), 1 deletion(-)
>>>
>>>
>>> The AST library is supposed to be self-contained; it's a layering
>>> violation to put TypoExpr member definitions into Sema. Conversely, it'd be
>>> a layering violation to put them into AST, since they currently delete Sema
>>> types. To avoid this, I suggest:
>>>
>>
>> I'd have actually preferred to keep TypoExpr out of the AST library
>> altogether since it is a synthetic node generated by Sema for use by Sema
>> and really shouldn't ever be seen/touched outside of Sema, as it is simply
>> a placeholder for Sema's typo correction. But of course defining AST nodes
>> that don't live in the AST library to a certain degree isn't possible... ah
>> well. I knew I'd have to deal with the layering issue at some point
>>
>> * The TypoExpr class should instead store a single opaque 'void*'
>>> pointer, and make no attempt to deallocate or otherwise manage it.
>>> * Sema owns the pointee and is responsible for deleting it (you can
>>> delete it and zero out the pointer on the TypoExpr at the point when you
>>> handle the TypoExpr)
>>>
>>> Alternatively, since you need Sema to track the live TypoExprs anyway,
>>> you could have Sema maintain a map from TypoExpr* to whatever data you want
>>> to store for the TypoExprs, and have the AST node be essentially stateless.
>>>
>>
>> I think I'll go with this approach since a TypoExpr node is just a
>> placeholder for internal use by Sema. For the mapping, do you think it
>> would be better to move the recovery callback and diagnostic generator into
>> the TypoCorrectionConsumer so that the map is simply from a TypoExpr to a
>> TypoCorrectionConsumer, or to place the three unique_ptrs in a tuple that
>> would go into the map, or to wrap the three unique_ptrs in a struct that is
>> stored in the map?
>>
>
> Since the map will be internal to Sema, I think you should go with
> whatever option makes the Sema implementation most convenient. Since we use
> TypoCorrectionConsumer for the non-delayed case, keeping the other parts
> separate seems likely to be best.
>
>
>> Incidentally, I wonder whether it'd be useful for TypoExpr to store the
>>> typoed identifier, maybe as an IdentifierInfo*? (This'd allow you to
>>> produce something more useful from the statement printer and ast dumper,
>>> and we really don't care much how big a TypoExpr is, since we'll --
>>> presumably -- never have many of them.) This might also be useful for
>>> clients that want to turn off typo correction but keep the TypoExprs in the
>>> AST.
>>>
>>
>> As the typo correction is currently structured, a TypoExpr would never be
>> generated when typo correction is turned off since the TypoExpr is only for
>> Sema to know when and where to go back to perform typo correction when it
>> wasn't immediately performed. I'm also not sure how useful it would be to
>> have meaningful output from the statement printer and AST dumper since the
>> TypoExpr shouldn't be kept around that long (it should be gone by the time
>> the full statement has finished being parsed and analysed--I cannot think
>> of any cases where that wouldn't be true that wouldn't also be a bug; after
>> all, until the TypoExpr is handled Sema won't know if the error it
>> represents is recoverable or what the recovered AST should be). If/once
>> there's a good use case for having the identifier-to-be-corrected stored in
>> the TypoExpr, it'll be simple enough to add since it is available when the
>> TypoExpr is created and is also already being stored in the
>> TypoCorrectionConsumer.
>>
>
> OK, I'm happy keeping it empty for now. I asked because we have several
> clang-as-a-library users who've asked for both
> a) parse what you can, and build an approximate AST for what you can't, and
> b) tell me where the typos are, but don't actually correct them
> ... and it seems that we could improve the situation in both cases if we
> had a mode to create TypoExprs but not act on them. Let's keep that as a
> separate discussion to be had once this work lands, though.
>

Makes sense. And I agree that those cases would be an extension of the
delayed typo correction facility once it exists.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20140725/c740f737/attachment.html>


More information about the cfe-commits mailing list