[cfe-dev] -fspell-checking causing modifications to AST?

Tue Jul 15 02:59:55 PDT 2014

On Monday 14 July 2014 17:52:42 Richard Smith wrote:
> On Mon, Jul 14, 2014 at 4:53 PM, Kevin Funk <kfunk at kde.org> wrote:
> > Hey,
> > 
> > I'm a bit puzzled by the following behavior of clang when inspecting the
> > AST
> > for the following code snippet:
> > 
> > test.cpp:
> > char foo;
> > char bar = foo1;
> > 
> > $ clang++ -cc1 -ast-dump test.cpp:
> > main.cpp:2:12: error: use of undeclared identifier 'foo1'; did you mean
> > 'foo'?
> > (...)
> > `-VarDecl 0xa19cb0 <line:2:1, col:12> col:6 bar 'char' cinit
> > 
> >   `-ImplicitCastExpr 0xa19d60 <col:12> 'char' <LValueToRValue>
> >   
> >     `-DeclRefExpr 0xa19d38 <col:12> 'char' lvalue Var 0xa19c40 'foo'
> >     'char'
> > 
> > So, Clang seems to interpret 'foo1' as a typo 'foo'. This is still fine.
> > However, Clang also "fixes up the code" and pretends that 'foo1' *is*
> > 'foo'
> > which -ast-dump shows (see last line of the dump). This is odd.
> > 
> > Obviously, this is only the case if -fno-spell-checking is *not* passed to
> > clang++. With -fno-spell-checking I get this instead:
> > 
> > $ clang++ -cc1 -ast-dump test.cpp -fno-spell-checking
> > (...)
> > `-VarDecl 0x1a79ca0 <line:2:1, col:6> col:6 bar 'char'
> > 
> > This looks fine to me.
> > 
> > So, my question: Is this intended behavior? Is this for error recovery in
> > the
> > parser?
> 
> Yes, this is intended. It is our policy that whenever we emit an error
> message with a fix-it hint, we recover as if that fix-it hint had been
> applied to the AST. This is not limited to typo-correction; it happens
> whenever we emit an error with a fix-it.
> 
> If true, is there a way to both enable spell-checking but to *disable* it
> 
> > touching the AST?
> 
> Not currently, no. I think what you're really asking for is for us to turn
> off all error recovery, so the AST contains only things that were in the
> source code, and omits erroneous things rather than trying to recover from
> errors?

Yes. That sounds interesting.

> We could try to support such a mode, but I suspect that it would
> degrade the diagnostic experience enough that you may want to run Clang
> twice: once in this mode, and once to produce user-facing diagnostics.

I fear the performance impacts here. Running it twice over a possibly large TU 
won't be possible for us.

> Also, I don't expect that you'll find volunteers in the Clang community to
> do the work, but if the design and rationale are reasonable and you can
> provide a convincing argument that the mode will have sufficient ongoing
> support work, we'd accept patches to implement this.

Well appreciated.

> Reminder: I'm working on integrating Clang (read: libclang) in KDevelop,
> 
> > hence
> > I'm looking at this from an development tool perspective. Magic behavior
> > inside Clang is somewhat undesirable there :)
> 
> Can you explain a bit more about your use cases? I assume you want to
> provide some kind of AST introspection on not-necessarily-valid code, and
> you don't want to expose our typo-correction results in that? Depending on
> what you're trying to achieve, there might be lighter-weight approaches
> (for instance, checking if the token at the given location actually refers
> to the right name).

Okay. You're right in a sense that we want to support introspecting half-
broken code in our IDE.

Suppose the following code snippet:

  class Foo {
    void bar();
  };

  void Foo::bar1() {}

Now when parsing this we'd like to have the following state in our code model:
- Be aware that 'bar1' might be a typo
- Be aware that this decl is named "Foo::bar1" right now, not "Foo::bar"

Regarding playing around with tokens to extract the identifier: This will 
likely slow down the parser dramatically, because this would have to be done 
for every declaration. After all, I cannot decide whether the state of an 
individual declaration in the AST is actually "sane" when spell-checking is 
enabled.

Does that make sense to you?

-- 
Kevin Funk | kfunk at kde.org | http://kfunk.org