[cfe-commits] r148389 - in /cfe/trunk: include/clang/Basic/DiagnosticLexKinds.td lib/Lex/LiteralSupport.cpp

Seth Cantrell seth.cantrell at gmail.com
Thu Jan 19 19:36:04 PST 2012


Here's a patch that improves that error message. A patch that improves the display of encoding errors will take me longer to get to.

On Jan 18, 2012, at 8:09 PM, Eli Friedman wrote:

> On Wed, Jan 18, 2012 at 4:44 PM, Seth Cantrell <seth.cantrell at gmail.com> wrote:
>> 
>> On Jan 18, 2012, at 5:49 PM, Eli Friedman wrote:
>> 
>>> On Wed, Jan 18, 2012 at 4:27 AM, Seth Cantrell <seth.cantrell at gmail.com> wrote:
>>>> +  while (begin!=end) {
>>>> +    // Is this a span of non-escape characters?
>>>> +    if (begin[0] != '\\') {
>>>> +      char const *start = begin;
>>>> +      do {
>>>> +        ++begin;
>>>> +      } while (begin != end && *begin != '\\');
>>>> +
>>>> +      uint32_t *tmp_begin = buffer_begin;
>>>> +      ConversionResult res =
>>>> +      ConvertUTF8toUTF32(reinterpret_cast<UTF8 const **>(&start),
>>>> +                         reinterpret_cast<UTF8 const *>(begin),
>>>> +                         &buffer_begin,buffer_end,strictConversion);
>>>> +      if (res!=conversionOK) {
>>>> +        PP.Diag(Loc, diag::err_bad_character_encoding);
>>> 
>>> This error message can lead to rather uninformative complaints which
>>> look like the following:
>>> 
>>> fribidi_char_sets_cp1256.c:214:9: error:
>>>      illegal sequence in character literal
>>> return '?';
>>>        ^
>>> 
>>> Any ideas for how we could improve this diagnostic?
>>> 
>>> -Eli
>> 
>> I suppose a marginally better message could be 'illegal character encoding in character literal'.
> 
> Yes, that would be a bit better.
> 
>> It'd also be good if the actual bytes could be highlighted. Something like vi's method of displaying illegal encodings using reversed colors would work to display them on the command line. Also adding a range to highlight the exact issue inside the literal. We'd need a way to calculate the locations for bytes inside the literal (there's a method there that looks like it works only for purely ascii strings). The console display for such ranges would need to be smarter about displaying ranges for lines that include multi-byte characters and also know about whatever method is chosen to show illegal bytes.
>> 
>>> fribidi_char_sets_cp1256.c:214:9: error:
>>>      illegal character encoding in character literal
>>> return '123<F1>';
>>>        ^   ~~~~
> 
> Displaying illegal bytes vi-style would be a big improvement.  Hacking
> up TextDiagnostic::emitSnippetAndCaret to do that should be
> straightforward, given a function to figure out whether a byte is
> illegal.
> 
> -Eli
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-improve-error-message-for-file-encoding-errors.patch
Type: application/octet-stream
Size: 3579 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120119/4e4c142c/attachment.obj>


More information about the cfe-commits mailing list