[cfe-commits] [PATCH] PR4111, PR5925, PR13210: Make tentative parsing smarter in the presence of undeclared identifiers

Thu Jun 28 13:49:21 PDT 2012

Hi,

The attached patch makes tentative parsing try harder when it sees an
undeclared identifier. In particular, it will try to typo-correct to a
declared identifier or an appropriate keyword:

int f(doulbe);
int k = f(4.2);

Was:

<stdin>:1:7: error: use of undeclared identifier 'doulbe'
int f(doulbe);
      ^
<stdin>:2:10: error: called object type 'int' is not a function or function
pointer
int k = f(4.2);
        ~^

Now:

<stdin>:1:7: error: use of undeclared identifier 'doulbe'; did you mean
'double'?
int f(doulbe);
      ^~~~~~
      double

If typo-correction fails, it will continue disambiguating the declaration,
in the hope that something later will resolve the error:

int p = 0;
int f(undeclared *p, int);
int g(undeclared *p, 0);
int *test(UnknownType *fool) { return 0; }

Was:

<stdin>:2:7: error: use of undeclared identifier 'undeclared'
int f(undeclared *p, int);
      ^
<stdin>:3:7: error: use of undeclared identifier 'undeclared'
int g(undeclared *p, 0);
      ^
<stdin>:4:11: error: use of undeclared identifier 'UnknownType'
int *test(UnknownType *fool) { return 0; }
          ^
<stdin>:4:24: error: use of undeclared identifier 'fool'
int *test(UnknownType *fool) { return 0; }
                       ^
<stdin>:4:29: error: expected ';' after top level declarator
int *test(UnknownType *fool) { return 0; }
                            ^
                            ;

Now:

<stdin>:2:7: error: unknown type name 'undeclared'
int f(undeclared *p, int);
      ^
<stdin>:3:7: error: use of undeclared identifier 'undeclared'
int g(undeclared *p, 0);
      ^
<stdin>:4:11: error: unknown type name 'UnknownType'
int *test(UnknownType *fool) { return 0; }
          ^

Additionally, this patch produces a tok::annot_primary_expr annotation when
tentative parsing meets an identifier and resolves it as a non-type, in
order to avoid the need to redo name lookup during the actual parse. This
annotation token was previously only used by expression-statement /
declaration-statement disambiguation.

One small wrinkle with this approach is that tentative parsing needs to be
aware of names which were declared earlier in the same parse. This is
accomplished by maintaining a set of tentatively-declared identifiers on
the parser (that is, names which it believes are going to be declared by
the current point in the tentative parse, but which it hasn't told Sema
about yet).

This fixes at least PR4111, PR5925, and PR13210. I've tested this patch
against a large pile of C++ code with no regressions.

Please review!

Thanks!
Richard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120628/a8e5f900/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: annotate-name.diff
Type: application/octet-stream
Size: 42266 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120628/a8e5f900/attachment.obj>