[cfe-commits] r148389 - in /cfe/trunk: include/clang/Basic/DiagnosticLexKinds.td lib/Lex/LiteralSupport.cpp
Seth Cantrell
seth.cantrell at gmail.com
Sun Jan 22 06:13:19 PST 2012
Okay, this isn't a patch, I just want to know if this is the direction you're looking for. The included changes will result in an error message like:
tmp.cpp:3:10: error: illegal sequence in character literal
char a = '<f1>';
^
It doesn't do anything to try to make the highlighted ranges look correct for the modified source line. This will cause a problem similar to the one that already exists with multi-byte UTF-8 encodings, where carets and ranges do not appear correct after these sequences.
Also, I think it would be better if these <XX> strings could be displayed with a reversed color scheme, just like vi, if the console supports it. Is there any possibility of doing that?
Anyway, here are the changes I am contemplating:
diff --git a/lib/Frontend/TextDiagnostic.cpp b/lib/Frontend/TextDiagnostic.cpp
index d2b8660..464b81b 100644
--- a/lib/Frontend/TextDiagnostic.cpp
+++ b/lib/Frontend/TextDiagnostic.cpp
@@ -10,6 +10,7 @@
#include "clang/Frontend/TextDiagnostic.h"
#include "clang/Basic/FileManager.h"
#include "clang/Basic/SourceManager.h"
+#include "clang/Basic/ConvertUTF.h"
#include "clang/Frontend/DiagnosticOptions.h"
#include "clang/Lex/Lexer.h"
#include "llvm/Support/MemoryBuffer.h"
@@ -17,6 +18,10 @@
#include "llvm/Support/ErrorHandling.h"
#include "llvm/ADT/SmallString.h"
#include <algorithm>
+
+#include <sstream>
+#include <iomanip>
+
using namespace clang;
static const enum raw_ostream::Colors noteColor =
@@ -592,6 +597,36 @@ void TextDiagnostic::emitSnippetAndCaret(
// Copy the line of code into an std::string for ease of manipulation.
std::string SourceLine(LineStart, LineEnd);
+
+ static const char trailingBytesForUTF8[256] = {
+ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
+ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
+ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
+ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
+ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
+ 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
+ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
+ 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, 3,3,3,3,3,3,3,3,4,4,4,4,5,5,5,5
+ };
+
+ unsigned char const *i = reinterpret_cast<unsigned char const *>(LineStart);
+ unsigned char const *end = reinterpret_cast<unsigned char const *>(LineEnd);
+ std::string SourceLineMod;
+ while(i!=end) {
+ if(isLegalUTF8Sequence(i,end)) {
+ std::copy(i,i+trailingBytesForUTF8[*i]+1,back_inserter(SourceLineMod));
+ i+=trailingBytesForUTF8[*i]+1;
+ } else {
+ std::stringstream ss;
+ ss.fill('0');
+ ss << std::hex << '<' << std::setw(2) << (unsigned int)*i << '>';
+ SourceLineMod.append(ss.str());
+ ++i;
+ }
+ }
+
+ SourceLine = SourceLineMod;
+
// Create a line for the caret that is filled with spaces that is the same
// length as the line of source code.
std::string CaretLine(LineEnd-LineStart, ' ');
On Jan 19, 2012, at 10:36 PM, Seth Cantrell wrote:
> Here's a patch that improves that error message. A patch that improves the display of encoding errors will take me longer to get to.
>
> On Jan 18, 2012, at 8:09 PM, Eli Friedman wrote:
>
>> On Wed, Jan 18, 2012 at 4:44 PM, Seth Cantrell <seth.cantrell at gmail.com> wrote:
>>>
>>> On Jan 18, 2012, at 5:49 PM, Eli Friedman wrote:
>>>
>>>> On Wed, Jan 18, 2012 at 4:27 AM, Seth Cantrell <seth.cantrell at gmail.com> wrote:
>>>>> + while (begin!=end) {
>>>>> + // Is this a span of non-escape characters?
>>>>> + if (begin[0] != '\\') {
>>>>> + char const *start = begin;
>>>>> + do {
>>>>> + ++begin;
>>>>> + } while (begin != end && *begin != '\\');
>>>>> +
>>>>> + uint32_t *tmp_begin = buffer_begin;
>>>>> + ConversionResult res =
>>>>> + ConvertUTF8toUTF32(reinterpret_cast<UTF8 const **>(&start),
>>>>> + reinterpret_cast<UTF8 const *>(begin),
>>>>> + &buffer_begin,buffer_end,strictConversion);
>>>>> + if (res!=conversionOK) {
>>>>> + PP.Diag(Loc, diag::err_bad_character_encoding);
>>>>
>>>> This error message can lead to rather uninformative complaints which
>>>> look like the following:
>>>>
>>>> fribidi_char_sets_cp1256.c:214:9: error:
>>>> illegal sequence in character literal
>>>> return '?';
>>>> ^
>>>>
>>>> Any ideas for how we could improve this diagnostic?
>>>>
>>>> -Eli
>>>
>>> I suppose a marginally better message could be 'illegal character encoding in character literal'.
>>
>> Yes, that would be a bit better.
>>
>>> It'd also be good if the actual bytes could be highlighted. Something like vi's method of displaying illegal encodings using reversed colors would work to display them on the command line. Also adding a range to highlight the exact issue inside the literal. We'd need a way to calculate the locations for bytes inside the literal (there's a method there that looks like it works only for purely ascii strings). The console display for such ranges would need to be smarter about displaying ranges for lines that include multi-byte characters and also know about whatever method is chosen to show illegal bytes.
>>>
>>>> fribidi_char_sets_cp1256.c:214:9: error:
>>>> illegal character encoding in character literal
>>>> return '123<F1>';
>>>> ^ ~~~~
>>
>> Displaying illegal bytes vi-style would be a big improvement. Hacking
>> up TextDiagnostic::emitSnippetAndCaret to do that should be
>> straightforward, given a function to figure out whether a byte is
>> illegal.
>>
>> -Eli
> <0002-improve-error-message-for-file-encoding-errors.patch>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120122/95fd4e10/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-3.png
Type: image/png
Size: 14443 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20120122/95fd4e10/attachment.png>
More information about the cfe-commits
mailing list