[cfe-dev] Fixits with multibyte chars

Mon Jul 16 10:56:23 PDT 2012

On Mon, Jul 16, 2012 at 10:49 AM, Jordan Rose <jordan_rose at apple.com> wrote:
>
> On Jul 16, 2012, at 10:47 , Benjamin Kramer <benny.kra at gmail.com> wrote:
>
>>
>> On 16.07.2012, at 19:32, Jordan Rose <jordan_rose at apple.com> wrote:
>>
>>> Hi, everyone. We recently hit an assertion when trying to output a fixit with Unicode characters in it; it reduces down to this:
>>>
>>> void test() {
>>> printf("∆: %d", 1L);
>>> }
>>>
>>> I could of course just disable fixits when there are Unicode characters involved, but I'd like to fix this the right way. The trouble is -fdiagnostics-parseable-fixits, which is supposed to be machine-readable output, and in this case is a three-byte UTF-8 character three columns or one column? I think one column is the right way to go, but I wanted to get some other opinions before I start working on a patch.
>>
>> This actually depends on the system. On some systems we'll print the unicode codepoint in hex, others will get the 1 column char. There is the llvm::sys::locale::columnWidth function to get this information in a portable way.
>
> Okay, that gives us two problems, then…for user-visible fixits we can use llvm::sys::locale::columnWidth (thanks, Ben), but then -fdiagnostics-parseable-fixits will have different column numbers? Is that okay?
>
> (Currently -fdiagnostics-parseable-fixits counts columns in bytes rather than characters.)

Machine-parsable "columns" are not the same as columns in the
terminal.  I'm pretty sure the model we want is one "column" per
Unicode code point, regardless of how it is displayed.

-Eli