<div class="gmail_quote">On Fri, Aug 10, 2012 at 5:56 PM, Jordan Rose <span dir="ltr"><<a href="mailto:jordan_rose@apple.com" target="_blank">jordan_rose@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi, everyone. I've been working on and off on PR13178, which would provide nice fixits and recovery for Unicode smart quotes and ellipses, and better diagnosis of other spurious non-ASCII characters. I have all the functionality in place, but see a dramatic (>5%) slowdown when parsing Cocoa.h (-fsyntax-only). I've tried as much as I can to keep the smart-quote-specific code out of the string literal loops, but certain pieces just can't be avoided. I don't think we want this fixit at the cost of the common correct case.<br>

</blockquote><div><br></div><div>I suspect most of the extra cost here is coming from the "case '\xE2' in LexNextToken, not from the string literal lexing. Have you tried sinking that into the 'if ((Char & 0x80) != 0)' in the default case? That might also benefit from being marked as unlikely.</div>

<div><br></div><div>I've not looked at the patch in detail yet. One thing I spotted in passing: your diagnostic messages should start with a lowercase letter.</div></div>