r225442 - Frontend: Fix SourceColumnMap assertion failure on non-ascii characters.

Logan Chien tzuhsiang.chien at gmail.com
Thu Jan 8 05:19:07 PST 2015


Author: logan
Date: Thu Jan  8 07:19:07 2015
New Revision: 225442

URL: http://llvm.org/viewvc/llvm-project?rev=225442&view=rev
Log:
Frontend: Fix SourceColumnMap assertion failure on non-ascii characters.

If there are some non-ascii character in the input source code, the
column index might be smallar than the byte index.  This will result
in two possible assertion failures.  This CL fixes the computation of
the column index and byte index.

1. The assertion in startOfNextColumn() and startOfPreviousColumn()
   should not be raised when the byte index is greater than the column
   index since the non-ascii characters may use more than one bytes to
   store a character in a column.

2. The length of the caret line should be equal to the number of columns
   of source line, instead of the length of the source line.  Otherwise,
   the assertion in selectInterestingSourceRegion will be raised because
   the removed columns plus the kept columns are not greater than the max
   column, which means that we should not remove any column at all.


Added:
    cfe/trunk/test/Frontend/source-col-map.c
Modified:
    cfe/trunk/lib/Frontend/TextDiagnostic.cpp

Modified: cfe/trunk/lib/Frontend/TextDiagnostic.cpp
URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Frontend/TextDiagnostic.cpp?rev=225442&r1=225441&r2=225442&view=diff
==============================================================================
--- cfe/trunk/lib/Frontend/TextDiagnostic.cpp (original)
+++ cfe/trunk/lib/Frontend/TextDiagnostic.cpp Thu Jan  8 07:19:07 2015
@@ -293,14 +293,14 @@ struct SourceColumnMap {
 
   /// \brief Map from a byte index to the next byte which starts a column.
   int startOfNextColumn(int N) const {
-    assert(0 <= N && N < static_cast<int>(m_columnToByte.size() - 1));
+    assert(0 <= N && N < static_cast<int>(m_byteToColumn.size() - 1));
     while (byteToColumn(++N) == -1) {}
     return N;
   }
 
   /// \brief Map from a byte index to the previous byte which starts a column.
   int startOfPreviousColumn(int N) const {
-    assert(0 < N && N < static_cast<int>(m_columnToByte.size()));
+    assert(0 < N && N < static_cast<int>(m_byteToColumn.size()));
     while (byteToColumn(--N) == -1) {}
     return N;
   }
@@ -323,9 +323,10 @@ static void selectInterestingSourceRegio
                                           std::string &FixItInsertionLine,
                                           unsigned Columns,
                                           const SourceColumnMap &map) {
-  unsigned MaxColumns = std::max<unsigned>(map.columns(),
-                                           std::max(CaretLine.size(),
-                                                    FixItInsertionLine.size()));
+  unsigned CaretColumns = CaretLine.size();
+  unsigned FixItColumns = llvm::sys::locale::columnWidth(FixItInsertionLine);
+  unsigned MaxColumns = std::max(static_cast<unsigned>(map.columns()),
+                                 std::max(CaretColumns, FixItColumns));
   // if the number of columns is less than the desired number we're done
   if (MaxColumns <= Columns)
     return;
@@ -1110,12 +1111,13 @@ void TextDiagnostic::emitSnippetAndCaret
   // Copy the line of code into an std::string for ease of manipulation.
   std::string SourceLine(LineStart, LineEnd);
 
-  // Create a line for the caret that is filled with spaces that is the same
-  // length as the line of source code.
-  std::string CaretLine(LineEnd-LineStart, ' ');
-
+  // Build the byte to column map.
   const SourceColumnMap sourceColMap(SourceLine, DiagOpts->TabStop);
 
+  // Create a line for the caret that is filled with spaces that is the same
+  // number of columns as the line of source code.
+  std::string CaretLine(sourceColMap.columns(), ' ');
+
   // Highlight all of the characters covered by Ranges with ~ characters.
   for (SmallVectorImpl<CharSourceRange>::iterator I = Ranges.begin(),
                                                   E = Ranges.end();

Added: cfe/trunk/test/Frontend/source-col-map.c
URL: http://llvm.org/viewvc/llvm-project/cfe/trunk/test/Frontend/source-col-map.c?rev=225442&view=auto
==============================================================================
--- cfe/trunk/test/Frontend/source-col-map.c (added)
+++ cfe/trunk/test/Frontend/source-col-map.c Thu Jan  8 07:19:07 2015
@@ -0,0 +1,37 @@
+// RUN: not %clang_cc1 %s -fsyntax-only -fmessage-length 75 -o /dev/null 2>&1 | FileCheck %s -strict-whitespace
+
+// Test case for the text diagnostics source column conversion crash.
+
+// This test case tries to check the error diagnostic message printer, which is
+// responsible to create the code snippet shorter than the message-length (in
+// number of columns.)
+//
+// The error diagnostic message printer should be able to handle the non-ascii
+// characters without any segmentation fault or assertion failure.  If your
+// changes to clang frontend crashes this case, it is likely that you are mixing
+// column index with byte index which are two totally different concepts.
+
+// NOTE: This file is encoded in UTF-8 and intentionally contains some
+// non-ASCII characters.
+
+__attribute__((format(printf, 1, 2)))
+extern int printf(const char *fmt, ...);
+
+void test1(Unknown* b);  // αααα αααα αααα αααα αααα αααα αααα αααα αααα αααα αααα
+// CHECK: unknown type name 'Unknown'
+// CHECK-NEXT: void test1(Unknown* b);  // αααα αααα αααα αααα αααα αααα αααα ααα...
+// CHECK-NEXT: {{^           \^$}}
+
+void test2(Unknown* b);  // αααα αααα αααα αααα αααα αααα αααα αααα αααα
+
+// CHECK: unknown type name 'Unknown'
+// CHECK-NEXT: void test2(Unknown* b);  // αααα αααα αααα αααα αααα αααα αααα αααα αααα
+// CHECK-NEXT: {{^           \^$}}
+
+void test3() {
+   /* αααα αααα αααα αααα αααα αααα αααα αααα αααα αααα */ printf("%d", "s");
+}
+// CHECK:       format specifies type 'int' but the argument has type 'char *'
+// CHECK-NEXT:   ...αααα αααα αααα αααα αααα αααα αααα αααα αααα */ printf("%d", "s");
+// CHECK-NEXT: {{^                                                             ~~   \^~~$}}
+// CHECK-NEXT: {{^                                                             %s$}}






More information about the cfe-commits mailing list