[cfe-commits] r148389 - in /cfe/trunk: include/clang/Basic/DiagnosticLexKinds.td lib/Lex/LiteralSupport.cpp

Eli Friedman eli.friedman at gmail.com
Wed Jan 18 17:09:06 PST 2012


On Wed, Jan 18, 2012 at 4:44 PM, Seth Cantrell <seth.cantrell at gmail.com> wrote:
>
> On Jan 18, 2012, at 5:49 PM, Eli Friedman wrote:
>
>> On Wed, Jan 18, 2012 at 4:27 AM, Seth Cantrell <seth.cantrell at gmail.com> wrote:
>>> +  while (begin!=end) {
>>> +    // Is this a span of non-escape characters?
>>> +    if (begin[0] != '\\') {
>>> +      char const *start = begin;
>>> +      do {
>>> +        ++begin;
>>> +      } while (begin != end && *begin != '\\');
>>> +
>>> +      uint32_t *tmp_begin = buffer_begin;
>>> +      ConversionResult res =
>>> +      ConvertUTF8toUTF32(reinterpret_cast<UTF8 const **>(&start),
>>> +                         reinterpret_cast<UTF8 const *>(begin),
>>> +                         &buffer_begin,buffer_end,strictConversion);
>>> +      if (res!=conversionOK) {
>>> +        PP.Diag(Loc, diag::err_bad_character_encoding);
>>
>> This error message can lead to rather uninformative complaints which
>> look like the following:
>>
>> fribidi_char_sets_cp1256.c:214:9: error:
>>      illegal sequence in character literal
>> return '?';
>>        ^
>>
>> Any ideas for how we could improve this diagnostic?
>>
>> -Eli
>
> I suppose a marginally better message could be 'illegal character encoding in character literal'.

Yes, that would be a bit better.

> It'd also be good if the actual bytes could be highlighted. Something like vi's method of displaying illegal encodings using reversed colors would work to display them on the command line. Also adding a range to highlight the exact issue inside the literal. We'd need a way to calculate the locations for bytes inside the literal (there's a method there that looks like it works only for purely ascii strings). The console display for such ranges would need to be smarter about displaying ranges for lines that include multi-byte characters and also know about whatever method is chosen to show illegal bytes.
>
>> fribidi_char_sets_cp1256.c:214:9: error:
>>      illegal character encoding in character literal
>> return '123<F1>';
>>        ^   ~~~~

Displaying illegal bytes vi-style would be a big improvement.  Hacking
up TextDiagnostic::emitSnippetAndCaret to do that should be
straightforward, given a function to figure out whether a byte is
illegal.

-Eli




More information about the cfe-commits mailing list