[cfe-dev] -fspell-checking causing modifications to AST?

Wed Jul 16 16:47:51 PDT 2014

On Tue, Jul 15, 2014 at 2:59 AM, Kevin Funk <kfunk at kde.org> wrote:

> On Monday 14 July 2014 17:52:42 Richard Smith wrote:
> > On Mon, Jul 14, 2014 at 4:53 PM, Kevin Funk <kfunk at kde.org> wrote:
> > > Hey,
> > >
> > > I'm a bit puzzled by the following behavior of clang when inspecting
> the
> > > AST
> > > for the following code snippet:
> > >
> > > test.cpp:
> > > char foo;
> > > char bar = foo1;
> > >
> > > $ clang++ -cc1 -ast-dump test.cpp:
> > > main.cpp:2:12: error: use of undeclared identifier 'foo1'; did you mean
> > > 'foo'?
> > > (...)
> > > `-VarDecl 0xa19cb0 <line:2:1, col:12> col:6 bar 'char' cinit
> > >
> > >   `-ImplicitCastExpr 0xa19d60 <col:12> 'char' <LValueToRValue>
> > >
> > >     `-DeclRefExpr 0xa19d38 <col:12> 'char' lvalue Var 0xa19c40 'foo'
> > >     'char'
> > >
> > > So, Clang seems to interpret 'foo1' as a typo 'foo'. This is still
> fine.
> > > However, Clang also "fixes up the code" and pretends that 'foo1' *is*
> > > 'foo'
> > > which -ast-dump shows (see last line of the dump). This is odd.
> > >
> > > Obviously, this is only the case if -fno-spell-checking is *not*
> passed to
> > > clang++. With -fno-spell-checking I get this instead:
> > >
> > > $ clang++ -cc1 -ast-dump test.cpp -fno-spell-checking
> > > (...)
> > > `-VarDecl 0x1a79ca0 <line:2:1, col:6> col:6 bar 'char'
> > >
> > > This looks fine to me.
> > >
> > > So, my question: Is this intended behavior? Is this for error recovery
> in
> > > the
> > > parser?
> >
> > Yes, this is intended. It is our policy that whenever we emit an error
> > message with a fix-it hint, we recover as if that fix-it hint had been
> > applied to the AST. This is not limited to typo-correction; it happens
> > whenever we emit an error with a fix-it.
> >
> > If true, is there a way to both enable spell-checking but to *disable* it
> >
> > > touching the AST?
> >
> > Not currently, no. I think what you're really asking for is for us to
> turn
> > off all error recovery, so the AST contains only things that were in the
> > source code, and omits erroneous things rather than trying to recover
> from
> > errors?
>
> Yes. That sounds interesting.
>
> > We could try to support such a mode, but I suspect that it would
> > degrade the diagnostic experience enough that you may want to run Clang
> > twice: once in this mode, and once to produce user-facing diagnostics.
>
> I fear the performance impacts here. Running it twice over a possibly
> large TU
> won't be possible for us.
>
> > Also, I don't expect that you'll find volunteers in the Clang community
> to
> > do the work, but if the design and rationale are reasonable and you can
> > provide a convincing argument that the mode will have sufficient ongoing
> > support work, we'd accept patches to implement this.
>
> Well appreciated.
>
> > Reminder: I'm working on integrating Clang (read: libclang) in KDevelop,
> >
> > > hence
> > > I'm looking at this from an development tool perspective. Magic
> behavior
> > > inside Clang is somewhat undesirable there :)
> >
> > Can you explain a bit more about your use cases? I assume you want to
> > provide some kind of AST introspection on not-necessarily-valid code, and
> > you don't want to expose our typo-correction results in that? Depending
> on
> > what you're trying to achieve, there might be lighter-weight approaches
> > (for instance, checking if the token at the given location actually
> refers
> > to the right name).
>
> Okay. You're right in a sense that we want to support introspecting half-
> broken code in our IDE.
>
> Suppose the following code snippet:
>
>   class Foo {
>     void bar();
>   };
>
>   void Foo::bar1() {}
>
> Now when parsing this we'd like to have the following state in our code
> model:
> - Be aware that 'bar1' might be a typo
> - Be aware that this decl is named "Foo::bar1" right now, not "Foo::bar"
>

OK. If we had a mechanism you could ask "was this cursor's name
typo-corrected?", would that suffice?

Regarding playing around with tokens to extract the identifier: This will
> likely slow down the parser dramatically, because this would have to be
> done
> for every declaration.

I don't know about "dramatically"; Clang can answer this sort of query
quite efficiently.

After all, I cannot decide whether the state of an
> individual declaration in the AST is actually "sane" when spell-checking is
> enabled.
>
> Does that make sense to you?

I think so. One more question: in your example above, do you care whether
we believe that the two functions are redeclarations of the same entity?
(Do you want to provide 'add a declaration of this to the class'-style
functionality, for instance?)

Suppose we could typo-correct this:

struct X {
  void f() const;
};
void X::f() {}

... to insert the missing 'const'. Would that be a problem for you too?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20140716/4eff7b5a/attachment.html>