[cfe-dev] Fixits with multibyte chars

Richard Smith richard at metafoo.co.uk
Mon Jul 16 11:35:21 PDT 2012


On Mon, Jul 16, 2012 at 10:56 AM, Eli Friedman <eli.friedman at gmail.com>wrote:

> On Mon, Jul 16, 2012 at 10:49 AM, Jordan Rose <jordan_rose at apple.com>
> wrote:
> >
> > On Jul 16, 2012, at 10:47 , Benjamin Kramer <benny.kra at gmail.com> wrote:
> >
> >>
> >> On 16.07.2012, at 19:32, Jordan Rose <jordan_rose at apple.com> wrote:
> >>
> >>> Hi, everyone. We recently hit an assertion when trying to output a
> fixit with Unicode characters in it; it reduces down to this:
> >>>
> >>> void test() {
> >>> printf("∆: %d", 1L);
> >>> }
> >>>
> >>> I could of course just disable fixits when there are Unicode
> characters involved, but I'd like to fix this the right way. The trouble is
> -fdiagnostics-parseable-fixits, which is supposed to be machine-readable
> output, and in this case is a three-byte UTF-8 character three columns or
> one column? I think one column is the right way to go, but I wanted to get
> some other opinions before I start working on a patch.
> >>
> >> This actually depends on the system. On some systems we'll print the
> unicode codepoint in hex, others will get the 1 column char. There is the
> llvm::sys::locale::columnWidth function to get this information in a
> portable way.
> >
> > Okay, that gives us two problems, then…for user-visible fixits we can
> use llvm::sys::locale::columnWidth (thanks, Ben), but then
> -fdiagnostics-parseable-fixits will have different column numbers? Is that
> okay?
> >
> > (Currently -fdiagnostics-parseable-fixits counts columns in bytes rather
> than characters.)
>
> Machine-parsable "columns" are not the same as columns in the
> terminal.  I'm pretty sure the model we want is one "column" per
> Unicode code point, regardless of how it is displayed.


How should we behave if the file contains a byte sequence which is not
valid UTF-8 (for instance, if arbitrary raw data is placed inside a raw
string literal)? For the machine-parsable form, I'd feel more comfortable
with bytes from the start of the line for this reason.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20120716/ae878cd0/attachment.html>


More information about the cfe-dev mailing list