[cfe-commits] PATCH: Add support for C++ namespace-aware typo correcting

Tue Mar 8 19:46:41 PST 2011

On Feb 22, 2011, at 11:21 AM, Kaelyn Uhrain wrote:

> This patch implements checking available C++ namespaces when trying to find typo corrections for unknown identifiers.  There are currently 4 broken unit tests (CXX/basic/basic.lookup/basic.lookup.argdep/p4.cpp, CXX/class/class.local/p1.cpp, FixIt/typo.cpp, and SemaObjC/synth-provisional-ivars.m) which I'm fairly sure are cases where the tests need to be updated to reflect the new behavior instead of being indicative of actual code breakage, but I'm not 100% certain.  The new unit test, test/Sema/missing-namespace-qualifier-typo-corrections.cpp, is a good demonstration of the improved suggestions provided by this patch as most of the errors found by "clang -fsyntax-only" lack suggestions without this patch..

This is really cool, and is be a great step forward in usability.

My only really concern is about performance, so I did a quick measurement. My test was to put all of the C++ Standard Library headers into a PCH file, then include it in a program that amounted to, basically,

	int main() {
		vector<int> v;
		std::Vector<int> v2;
	}

so there are two typo corrections. And here is some timing data from before/after the patch:

Before:

*** AST File Statistics:
  18 stat cache hits
  2 stat cache misses
  97/39174 source location entries read (0.247613%)
  11142/18925 types read (58.874504%)
  16712/32353 declarations read (51.655178%)
  1734/6785 identifiers read (25.556374%)
  1861/70867 statements read (2.626046%)
  0/1766 macros read (0.000000%)
  188/3464 lexical declcontexts read (5.427252%)
  43/695 visible declcontexts read (6.187050%)

Wall time: 0m0.051s

After:

*** AST File Statistics:
  18 stat cache hits
  2 stat cache misses
  97/39174 source location entries read (0.247613%)
  17162/18925 types read (90.684280%)
  28421/32353 declarations read (87.846565%)
  3943/6785 identifiers read (58.113487%)
  70371/70867 statements read (99.300095%)
  0/1766 macros read (0.000000%)
  3383/3464 lexical declcontexts read (97.661659%)
  6080/695 visible declcontexts read (874.820129%)

Wall time: 0m0.115s

So it's a little over 2x slower, and (more importantly) deserializes a whole lot more from the PCH file. That's actually a concern, since typo correction still needs to be fast for many applications, and the additional disk I/O really becomes a problem as the PCH files get larger (e.g., containing significant chunks of Boost).

I have a recommendation. Rather than using the recursive AST visitor to walk the entire AST, can we instead 

  (1) First do the "identifier" lookup that we already do, to find the best name that we know about (which could be anywhere in the translation unit), then
  (2) If lookup for that identifier fails, go look in all of the namespaces that we know about and see if the identifier is located in one of those.

We'd have to have some list of known namespaces stored somewhere (and saved in the PCH file for fast retrieval), but at least this way we'd be paying the cost we already pay now to look at the identifiers + a cost that's O(number of unique namespaces). 

Does that seem reasonable?

	- Doug