<div class="gmail_quote">On Mon, Jan 23, 2012 at 2:05 PM, Chandler Carruth <span dir="ltr"><<a href="mailto:chandlerc@google.com">chandlerc@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="gmail_quote"><div class="im">On Mon, Jan 23, 2012 at 12:18 PM, Kaelyn Uhrain <span dir="ltr"><<a href="mailto:rikka@google.com" target="_blank">rikka@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Author: rikka<br>
Date: Mon Jan 23 14:18:59 2012<br>
New Revision: 148720<br>
<br>
URL: <a href="http://llvm.org/viewvc/llvm-project?rev=148720&view=rev" target="_blank">http://llvm.org/viewvc/llvm-project?rev=148720&view=rev</a><br>
Log:<br>
In CorrectTypo, use the cached correction as a starting point instead.<br>
<br>
Previously, for unqualified lookups, a positive cache hit is used as the<br>
only non-keyword correction and a negative cache hit immediately returns<br>
an empty TypoCorrection. With the new callback objects, this behavior<br>
causes false negatives by not accounting for the fact that callback<br>
objects alter the set of potential/allowed corrections. The new behavior<br>
is to seed the set of corrections with the cached correction (for<br>
positive hits) to estabilishing a baseline edit distance. Negative cache<br>
hits are only stored or used when either no callback object is provided<br>
or when it returns true for a call to ValidateCandidate with an empty<br>
TypoCorrection (i.e. when ValidateCandidate does not seem to be doing<br>
any checking of the TypoCorrection, such as when an instance of the base<br>
callback class is used solely to specify the set of keywords to be accepted).<br></blockquote><div><br></div></div><div>Is there any performance impact to this change? It seems like it would make the caching much less effective. Is that acceptable for all of the correction clients?</div>
</div>
</blockquote></div><br><div>I haven't tested it, but I suspect the performance impact is fairly minimal in most cases--the same typo would have to be repeated many many times within the same translation unit for caching to yield much benefit, and even then only if the cached correction actually works in (almost) every case instead of triggering cascading errors. Plus there were already other issues with the caching that have been accumulating as CorrectTypo has become more sophisticated, since there had not been any checks on the cached correction e.g. to make sure it is actually reachable in the current location. If the one extreme case is enough of a concern, I can extend the stop gap for when there are more than 20 unqualified typos corrected to not continue with the normal correction path if a given typo was found in the correction cache (I just noticed that I broke the stop-gap in certain cases when I added the callback objects and changed the code to only return the cached correction if it passes validation).</div>