[PATCH] Frontend: Fix SourceColumnMap assertion failure on non-ascii characters.

Logan Chien tzuhsiang.chien at gmail.com
Fri Jan 2 10:41:41 PST 2015


Ping?

I have found that SourceColMap may trigger assertion failure when the input
source code contains some non-ascii characters.  This is due to the fact
that some column is consisted of multiple bytes and the code is mixing the
column index and byte index.  This patch should fix the problem.  Please
have a look.  Thanks!

http://reviews.llvm.org/D6746

Logan

On Tue, Dec 30, 2014 at 8:11 AM, Logan Chien <tzuhsiang.chien at gmail.com>
wrote:

> Ping?
>
>
> On Thu, Dec 25, 2014 at 12:54 AM, Logan Chien <tzuhsiang.chien at gmail.com>
> wrote:
>
>> Hi Richard,
>>
>> I found that some assertion failure will be raised from the
>> lib/Frontend/TextDiagnostic.cpp since the assertion incorrectly compares
>> the byte index with the number of columns, which is a problem when the
>> input source code contains non-ascii characters (multiple bytes per
>> column.)  The patch should fix the problem, please have a look.  Thanks.
>>
>> http://reviews.llvm.org/D6746
>>
>> Sincerely,
>> Logan
>>
>>
>> On Sat, Dec 20, 2014 at 5:17 PM, Logan Chien <tzuhsiang.chien at gmail.com>
>> wrote:
>>
>>> Hi rsmith,
>>>
>>> http://reviews.llvm.org/D6746
>>>
>>> Files:
>>>   lib/Frontend/TextDiagnostic.cpp
>>>   test/Frontend/source-col-map.c
>>>
>>> Index: lib/Frontend/TextDiagnostic.cpp
>>> ===================================================================
>>> --- lib/Frontend/TextDiagnostic.cpp
>>> +++ lib/Frontend/TextDiagnostic.cpp
>>> @@ -293,14 +293,14 @@
>>>
>>>    /// \brief Map from a byte index to the next byte which starts a
>>> column.
>>>    int startOfNextColumn(int N) const {
>>> -    assert(0 <= N && N < static_cast<int>(m_columnToByte.size() - 1));
>>> +    assert(0 <= N && N < static_cast<int>(m_byteToColumn.size() - 1));
>>>      while (byteToColumn(++N) == -1) {}
>>>      return N;
>>>    }
>>>
>>>    /// \brief Map from a byte index to the previous byte which starts a
>>> column.
>>>    int startOfPreviousColumn(int N) const {
>>> -    assert(0 < N && N < static_cast<int>(m_columnToByte.size()));
>>> +    assert(0 < N && N < static_cast<int>(m_byteToColumn.size()));
>>>      while (byteToColumn(--N) == -1) {}
>>>      return N;
>>>    }
>>> @@ -323,9 +323,10 @@
>>>                                            std::string
>>> &FixItInsertionLine,
>>>                                            unsigned Columns,
>>>                                            const SourceColumnMap &map) {
>>> -  unsigned MaxColumns = std::max<unsigned>(map.columns(),
>>> -                                           std::max(CaretLine.size(),
>>> -
>>> FixItInsertionLine.size()));
>>> +  unsigned CaretColumns = CaretLine.size();
>>> +  unsigned FixItColumns =
>>> llvm::sys::locale::columnWidth(FixItInsertionLine);
>>> +  unsigned MaxColumns = std::max(static_cast<unsigned>(map.columns()),
>>> +                                 std::max(CaretColumns, FixItColumns));
>>>    // if the number of columns is less than the desired number we're done
>>>    if (MaxColumns <= Columns)
>>>      return;
>>> @@ -1110,12 +1111,13 @@
>>>    // Copy the line of code into an std::string for ease of manipulation.
>>>    std::string SourceLine(LineStart, LineEnd);
>>>
>>> -  // Create a line for the caret that is filled with spaces that is the
>>> same
>>> -  // length as the line of source code.
>>> -  std::string CaretLine(LineEnd-LineStart, ' ');
>>> -
>>> +  // Build the byte to column map.
>>>    const SourceColumnMap sourceColMap(SourceLine, DiagOpts->TabStop);
>>>
>>> +  // Create a line for the caret that is filled with spaces that is the
>>> same
>>> +  // number of columns as the line of source code.
>>> +  std::string CaretLine(sourceColMap.columns(), ' ');
>>> +
>>>    // Highlight all of the characters covered by Ranges with ~
>>> characters.
>>>    for (SmallVectorImpl<CharSourceRange>::iterator I = Ranges.begin(),
>>>                                                    E = Ranges.end();
>>> Index: test/Frontend/source-col-map.c
>>> ===================================================================
>>> --- /dev/null
>>> +++ test/Frontend/source-col-map.c
>>> @@ -0,0 +1,41 @@
>>> +// RUN: not %clang_cc1 %s -fsyntax-only -fmessage-length 75 -o
>>> /dev/null 2>&1 | FileCheck %s -strict-whitespace
>>> +
>>> +// Test case for the text diagnostics source column conversion crash.
>>> +
>>> +// There are two problems to be tested:
>>> +//
>>> +// 1. The assertion in startOfNextColumn() and startOfPreviousColumn()
>>> +//    should not be raised when the byte index is greater than the
>>> column index
>>> +//    since the non-ascii characters may use more than one bytes to
>>> store a
>>> +//    character in a column.
>>> +//
>>> +// 2. The length of the caret line should be equal to the number of
>>> columns of
>>> +//    source line, instead of the length of the source line.
>>> Otherwise, the
>>> +//    assertion in selectInterestingSourceRegion will be raised because
>>> the
>>> +//    removed columns plus the kept columns are not greater than the
>>> max column,
>>> +//    which means that we should not remove any column at all.
>>> +
>>> +// NOTE: This file is encoded in UTF-8 and intentionally contains some
>>> +// non-ASCII characters.
>>> +
>>> +__attribute__((format(printf, 1, 2)))
>>> +extern int printf(const char *fmt, ...);
>>> +
>>> +void test1(Unknown* b);  // αααα αααα αααα αααα αααα αααα αααα αααα
>>> αααα αααα αααα
>>> +// CHECK: unknown type name 'Unknown'
>>> +// CHECK-NEXT: void test1(Unknown* b);  // αααα αααα αααα αααα αααα
>>> αααα αααα ααα...
>>> +// CHECK-NEXT: {{^           \^$}}
>>> +
>>> +void test2(Unknown* b);  // αααα αααα αααα αααα αααα αααα αααα αααα αααα
>>> +
>>> +// CHECK: unknown type name 'Unknown'
>>> +// CHECK-NEXT: void test2(Unknown* b);  // αααα αααα αααα αααα αααα
>>> αααα αααα αααα αααα
>>> +// CHECK-NEXT: {{^           \^$}}
>>> +
>>> +void test3() {
>>> +   /* αααα αααα αααα αααα αααα αααα αααα αααα αααα αααα */ printf("%d",
>>> "s");
>>> +}
>>> +// CHECK:       format specifies type 'int' but the argument has type
>>> 'char *'
>>> +// CHECK-NEXT:   ...αααα αααα αααα αααα αααα αααα αααα αααα αααα */
>>> printf("%d", "s");
>>> +// CHECK-NEXT: {{^
>>>        ~~   \^~~$}}
>>> +// CHECK-NEXT: {{^
>>>        %s$}}
>>>
>>> EMAIL PREFERENCES
>>>   http://reviews.llvm.org/settings/panel/emailpreferences/
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20150103/dc6d6bdc/attachment.html>


More information about the cfe-commits mailing list